[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226872#comment-14226872 ]
Jihong Liu commented on HIVE-8966: ---------------------------------- Yes. Closed the transaction batch. Suggest to do either the following two updates, or do both: 1. if a file is non-bucket file, don't try to compact it. So update the following code: in org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.java Change the following code: private void addFileToMap(Matcher matcher, Path file, boolean sawBase, Map<Integer, BucketTracker> splitToBucketMap) { if (!matcher.find()) { LOG.warn("Found a non-bucket file that we thought matched the bucket pattern! " + file.toString()); } ..... to: private void addFileToMap(Matcher matcher, Path file, boolean sawBase, Map<Integer, BucketTracker> splitToBucketMap) { if (!matcher.find()) { LOG.warn("Found a non-bucket file that we thought matched the bucket pattern! " + file.toString()); return; } .... 2. don't use the bucket file pattern to name to "flush_length" file. So update the following code: in org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.java change the following code: static Path getSideFile(org.apache.tools.ant.types.Path main) { return new Path(main + "_flush_length"); } to: static Path getSideFile(org.apache.tools.ant.types.Path main) { if (main.toString().startsWith("bucket_")) { return new Path("bkt"+main.toString().substring(6)+ "_flush_length"); } else return new Path(main + "_flush_length"); } after did the above updates and re-compiled the hive-exec.jar, the compaction works fine now > Delta files created by hive hcatalog streaming cannot be compacted > ------------------------------------------------------------------ > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog > Affects Versions: 0.14.0 > Environment: hive > Reporter: Jihong Liu > Assignee: Alan Gates > Priority: Critical > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)