[GitHub] [hive] klcopp commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

GitBox Fri, 22 Apr 2022 02:25:05 -0700


klcopp commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r855867715



##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws 
SemanticException, LockException {
 
   private void allocateWriteIdForAcidAnalyzeTable() throws LockException {
     if (driverContext.getPlan().getAcidAnalyzeTable() != null) {
+      //Inside a compaction transaction, only stats gathering is running which 
is not requiring a new write id,
+      //and for duplicate compaction detection it is necessary to not 
increment it.
+      boolean isWithinCompactionTxn = 
Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG));

Review Comment:
   I think you can use driverContext.getTxnType() instead (TxnType.COMPACTION)



##########
ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java:
##########
@@ -797,7 +797,6 @@ public void testCompactStatsGather() throws Exception {
 
     int[][] targetVals2 = {{5, 1, 1}, {5, 2, 2}, {5, 3, 1}, {5, 4, 2}};
     runStatementOnDriver("insert into T partition(p=1,q) " + 
makeValuesClause(targetVals2));
-

Review Comment:
   Nit: Unnecessary change to this file



##########
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java:
##########
@@ -122,6 +125,56 @@ public void 
testCompactionShouldNotFailOnPartitionsWithBooleanField() throws Exc
             "ready for cleaning", compacts.get(0).getState());
   }
 
+  @Test
+  public void secondCompactionShouldBeRefusedBeforeEnqueueing() throws 
Exception {
+    conf.setBoolVar(HiveConf.ConfVars.COMPACTOR_CRUD_QUERY_BASED, true);
+    // Set delta numbuer threshold to 2 to avoid skipping compaction because 
of too few deltas
+    conf.setIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD, 2);
+    // Set delta percentage to a high value to suppress selecting major 
compression based on that
+    conf.setFloatVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD, 
1000f);

Review Comment:
   These 2 settings aren't necessary since the Initiator uses these thresholds, 
but in the test we always queue compaction manually



##########
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/CompactorTest.java:
##########
@@ -38,6 +38,7 @@
 import org.apache.hadoop.hive.metastore.api.AllocateTableWriteIdsResponse;
 import org.apache.hadoop.hive.metastore.api.CommitTxnRequest;
 import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionResponse;

Review Comment:
   Nit: Unused import



##########
ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java:
##########
@@ -235,18 +235,18 @@ private void loadData(boolean isVectorized) throws 
Exception {
     runStatementOnDriver("export table Tstage to '" + getWarehouseDir() 
+"/2'");
     runStatementOnDriver("load data inpath '" + getWarehouseDir() + "/2/data' 
overwrite into table T");
     String[][] expected3 = new String[][] {
-        {"{\"writeid\":5,\"bucketid\":536870912,\"rowid\":0}\t5\t6", 
"t/base_0000005/000000_0"},

Review Comment:
   Just trying to understand – Why was the writeid 5 originally? And if there 
was no compaction in the meantime, why is it 4 now?



##########
ql/src/test/queries/clientpositive/acid_insert_overwrite_update.q:
##########
@@ -26,7 +26,6 @@ insert overwrite table sequential_update 
values(current_timestamp, 0, current_ti
 delete from sequential_update where seq=2;
 select distinct IF(seq==0, 'LOOKS OKAY', 'BROKEN'), 
regexp_extract(INPUT__FILE__NAME, '.*/(.*)/[^/]*', 1) from sequential_update;
 
-alter table sequential_update compact 'major';

Review Comment:
   Why change this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] klcopp commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Reply via email to