[
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226872#comment-14226872
]
Jihong Liu commented on HIVE-8966:
----------------------------------
Yes. Closed the transaction batch. Suggest to do either the following two
updates, or do both:
1. if a file is non-bucket file, don't try to compact it. So update the
following code:
in org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.java
Change the following code:
private void addFileToMap(Matcher matcher, Path file, boolean sawBase,
Map<Integer, BucketTracker> splitToBucketMap) {
if (!matcher.find()) {
LOG.warn("Found a non-bucket file that we thought matched the bucket
pattern! " +
file.toString());
}
.....
to:
private void addFileToMap(Matcher matcher, Path file, boolean sawBase,
Map<Integer, BucketTracker> splitToBucketMap) {
if (!matcher.find()) {
LOG.warn("Found a non-bucket file that we thought matched the bucket
pattern! " +
file.toString());
return;
}
....
2. don't use the bucket file pattern to name to "flush_length" file. So update
the following code:
in org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.java
change the following code:
static Path getSideFile(org.apache.tools.ant.types.Path main) {
return new Path(main + "_flush_length");
}
to:
static Path getSideFile(org.apache.tools.ant.types.Path main) {
if (main.toString().startsWith("bucket_")) {
return new Path("bkt"+main.toString().substring(6)+
"_flush_length");
}
else return new Path(main + "_flush_length");
}
after did the above updates and re-compiled the hive-exec.jar, the compaction
works fine now
> Delta files created by hive hcatalog streaming cannot be compacted
> ------------------------------------------------------------------
>
> Key: HIVE-8966
> URL: https://issues.apache.org/jira/browse/HIVE-8966
> Project: Hive
> Issue Type: Bug
> Components: HCatalog
> Affects Versions: 0.14.0
> Environment: hive
> Reporter: Jihong Liu
> Assignee: Alan Gates
> Priority: Critical
>
> hive hcatalog streaming will also create a file like bucket_n_flush_length in
> each delta directory. Where "n" is the bucket number. But the
> compactor.CompactorMR think this file also needs to compact. However this
> file of course cannot be compacted, so compactor.CompactorMR will not
> continue to do the compaction.
> Did a test, after removed the bucket_n_flush_length file, then the "alter
> table partition compact" finished successfully. If don't delete that file,
> nothing will be compacted.
> This is probably a very severity bug. Both 0.13 and 0.14 have this issue
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)