[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226872#comment-14226872
 ] 

Jihong Liu commented on HIVE-8966:
----------------------------------

Yes. Closed the transaction batch. Suggest to do either the following two 
updates, or do both:

1. if a file is non-bucket file, don't try to compact it. So update the 
following code:
   in org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.java
  Change the following code:

  private void addFileToMap(Matcher matcher, Path file, boolean sawBase,
                              Map<Integer, BucketTracker> splitToBucketMap) {
      if (!matcher.find()) {
        LOG.warn("Found a non-bucket file that we thought matched the bucket 
pattern! " +
            file.toString());
      }

   .....
 to:
   private void addFileToMap(Matcher matcher, Path file, boolean sawBase,
                              Map<Integer, BucketTracker> splitToBucketMap) {
      if (!matcher.find()) {
        LOG.warn("Found a non-bucket file that we thought matched the bucket 
pattern! " +
            file.toString());
        return;
      }
     ....

2. don't use the bucket file pattern to name to "flush_length" file. So update 
the following code:
  in org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.java
 change the following code:
   static Path getSideFile(org.apache.tools.ant.types.Path main) {
     return new Path(main + "_flush_length");
   }

to:
 static Path getSideFile(org.apache.tools.ant.types.Path main) {
        if (main.toString().startsWith("bucket_")) {
             return new Path("bkt"+main.toString().substring(6)+ 
"_flush_length");
        }
              else return new Path(main + "_flush_length");
  }
 
after did the above updates and re-compiled the hive-exec.jar, the compaction 
works fine now


> Delta files created by hive hcatalog streaming cannot be compacted
> ------------------------------------------------------------------
>
>                 Key: HIVE-8966
>                 URL: https://issues.apache.org/jira/browse/HIVE-8966
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.14.0
>         Environment: hive
>            Reporter: Jihong Liu
>            Assignee: Alan Gates
>            Priority: Critical
>
> hive hcatalog streaming will also create a file like bucket_n_flush_length in 
> each delta directory. Where "n" is the bucket number. But the 
> compactor.CompactorMR think this file also needs to compact. However this 
> file of course cannot be compacted, so compactor.CompactorMR will not 
> continue to do the compaction. 
> Did a test, after removed the bucket_n_flush_length file, then the "alter 
> table partition compact" finished successfully. If don't delete that file, 
> nothing will be compacted. 
> This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to