----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/#review210589 -----------------------------------------------------------
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Lines 2685 (patched) <https://reviews.apache.org/r/69367/#comment295290> "And minor compaction will be disabled." - should make sure Initiator doesn't start minor and that Alter Table commands requesting Minor are no-op or throw so that these don't get into the compactor queue. We should also, perhaps think about how Initiator triggers Major compactions - are current config params adequate? Should do at least the 2nd part in a follow up jira, maybe both. ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Line 180 (original), 183 (patched) <https://reviews.apache.org/r/69367/#comment295291> I guess all this should be no-op for compactor since it only looks at 1 partition at a time and for acid serde and IF/OF don't change. ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Lines 197 (patched) <https://reviews.apache.org/r/69367/#comment295292> bucketSplitMultiMap? ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Lines 206 (patched) <https://reviews.apache.org/r/69367/#comment295293> the error should include table name if easily available here or if not maybe a file path from any of the splits... ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Lines 214 (patched) <https://reviews.apache.org/r/69367/#comment295294> should we assert that schemaSplitMultiMap has size=1 since that is what we expect for compactor? ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Lines 276 (patched) <https://reviews.apache.org/r/69367/#comment295295> Add a comment that this is trully a bucketId (rather than bucket property - BucketCodec.java since 3.0) that is derived from file name WriteId is also from containing file name and for files that have min/max wrieid, it's the starting one. Now that I look at the code in TransactionMetadata.findWriteIDForSynthetcRowIDs() - the assert there will throw. It should be removed since where we have to handle files that come from compacted dirs so min <> max for all deltas. maybe these comments should be on OrcSplit where getter methods are defined. ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java Lines 68 (patched) <https://reviews.apache.org/r/69367/#comment295296> mark these transient for clarity since we don't serialize them ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 245 (patched) <https://reviews.apache.org/r/69367/#comment295297> Ideally this should be prevented before it gets into the compction_queue. throwing here will cause failed compactions to accumulate in SHOW COMPACTIONS and prevent auto-scheduling of new ones. ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 399 (patched) <https://reviews.apache.org/r/69367/#comment295298> should this be in a finally{}? SessionState is threadLocal so it may get reused... or do we shutdown the session each time? ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 481 (patched) <https://reviews.apache.org/r/69367/#comment295299> current write id should always be the same as original. Only delete event can have these be different but major compaction absorbs delete events. ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 503 (patched) <https://reviews.apache.org/r/69367/#comment295300> what's the value of specifying location for tmp table? I'm surprised it's even legal. Would this be a security hole potentially? ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 510 (patched) <https://reviews.apache.org/r/69367/#comment295302> why overwrite? ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 513 (patched) <https://reviews.apache.org/r/69367/#comment295301> why do you need partition key/values in the query? we are always reading a single partition. This is achieved by getAcidState() which takes partition dir as input (i.e. all the files it returns are within a given partition) ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 542 (patched) <https://reviews.apache.org/r/69367/#comment295303> need to think about this. maybe it's ok... ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 565 (patched) <https://reviews.apache.org/r/69367/#comment295304> there should be something in AcidUtils to parse original bucket file name - Eugene Koifman On Nov. 15, 2018, 4:59 p.m., Vaibhav Gumashta wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69367/ > ----------------------------------------------------------- > > (Updated Nov. 15, 2018, 4:59 p.m.) > > > Review request for hive and Eugene Koifman. > > > Bugs: HIVE-20699 > https://issues.apache.org/jira/browse/HIVE-20699 > > > Repository: hive-git > > > Description > ------- > > https://jira.apache.org/jira/browse/HIVE-20699 > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java > 40dd992455 > pom.xml 26b662e4c3 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 92c74e1d06 > > > Diff: https://reviews.apache.org/r/69367/diff/1/ > > > Testing > ------- > > > Thanks, > > Vaibhav Gumashta > >