----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72193/#review219795 -----------------------------------------------------------
How feasible would it be to launch this process from CompactorMR instead of the QueryCompactor implementations, and fall back to QueryCompactors if it fails? Because it's not exactly a "query compaction" but more of a "FileMergeCompaction"... ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/OrcFileMerger.java Lines 39 (patched) <https://reviews.apache.org/r/72193/#comment308020> Should this class be public? ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/OrcFileMerger.java Lines 62-64 (patched) <https://reviews.apache.org/r/72193/#comment308021> If the readers aren't compatible, compaction fails silently? Consider returning a boolean or throwing an exception and falling back to query-based compaction. ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/OrcFileMerger.java Lines 67 (patched) <https://reviews.apache.org/r/72193/#comment308024> Does the Reader need to be closed as well? Not sure ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/OrcFileMerger.java Lines 76 (patched) <https://reviews.apache.org/r/72193/#comment308025> nit: if nextBatch(batch) is true, will batch ever be null? Idk, at the end of the day I guess it's good to be on the safe side:) ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/OrcFileMerger.java Lines 104 (patched) <https://reviews.apache.org/r/72193/#comment308023> Output path should be logged once, not every time setupWriter is called, unless outPath's value changes somewhere... ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java Lines 220 (patched) <https://reviews.apache.org/r/72193/#comment308015> Since there's no need to check for delete dirs for insert-only tables, maybe (a) skip the delete dir check in this function or (b) split this function into 2: hasDeletes and hasAborted ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java Lines 227 (patched) <https://reviews.apache.org/r/72193/#comment308014> Wouldn't it be more efficient to check !directory.getAbortedDirectories().isEmpty() at the very beginning of this function, before starting all the streaming and filtering? ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java Lines 293 (patched) <https://reviews.apache.org/r/72193/#comment308016> Idea: Could use this in org.apache.hadoop.hive.ql.txn.compactor.MinorQueryCompactor#getCreateQueries for delta creation. Or possibly for both deltas and delete deltas, if you added a param. ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java Lines 314-315 (patched) <https://reviews.apache.org/r/72193/#comment308022> Consider debug/info log message about merge starting ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java Lines 318 (patched) <https://reviews.apache.org/r/72193/#comment308019> nit: rename e to something meaningful like listOfDeltaPaths - Karen Coppage On March 4, 2020, 3:31 p.m., Laszlo Pinter wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72193/ > ----------------------------------------------------------- > > (Updated March 4, 2020, 3:31 p.m.) > > > Review request for hive, Karen Coppage and Peter Vary. > > > Repository: hive-git > > > Description > ------- > > HIVE-22977: Merge delta files instead of running a query in major/minor > compaction > > > Diffs > ----- > > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorOnTezTest.java > 78174f345b35709cd654aa81578ab598e0d9ed9c > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java > 9659a3f0481dcb2446b197688459f0c1dba867fa > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestMmCompactorOnTez.java > 074430ce7fa0f0617e8fb50c334c14f33cc74d8a > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > c44f2b5026558c9f0a7d6fa03cb6950f24b77da2 > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MajorQueryCompactor.java > 93850807137a4cfbd49beb256624b11801bd08d1 > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MinorQueryCompactor.java > 01cd2fc93d12002249253added06df70b0c40181 > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMajorQueryCompactor.java > 41fdd7e210bfc42c3e41e9f1240d34a51add33a9 > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMinorQueryCompactor.java > feb667cba960c0fdd19c030235eb31ebddfa7ca1 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/OrcFileMerger.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java > 7a9e48ff1eb88f65cc41165b3908f3dee9245f61 > > > Diff: https://reviews.apache.org/r/72193/diff/2/ > > > Testing > ------- > > > Thanks, > > Laszlo Pinter > >