[
https://issues.apache.org/jira/browse/HIVE-26107?focusedWorklogId=760707&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760707
]
ASF GitHub Bot logged work on HIVE-26107:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 22/Apr/22 09:25
Start Date: 22/Apr/22 09:25
Worklog Time Spent: 10m
Work Description: klcopp commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r855867715
##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws
SemanticException, LockException {
private void allocateWriteIdForAcidAnalyzeTable() throws LockException {
if (driverContext.getPlan().getAcidAnalyzeTable() != null) {
+ //Inside a compaction transaction, only stats gathering is running which
is not requiring a new write id,
+ //and for duplicate compaction detection it is necessary to not
increment it.
+ boolean isWithinCompactionTxn =
Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG));
Review Comment:
I think you can use driverContext.getTxnType() instead (TxnType.COMPACTION)
##########
ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java:
##########
@@ -797,7 +797,6 @@ public void testCompactStatsGather() throws Exception {
int[][] targetVals2 = {{5, 1, 1}, {5, 2, 2}, {5, 3, 1}, {5, 4, 2}};
runStatementOnDriver("insert into T partition(p=1,q) " +
makeValuesClause(targetVals2));
-
Review Comment:
Nit: Unnecessary change to this file
##########
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java:
##########
@@ -122,6 +125,56 @@ public void
testCompactionShouldNotFailOnPartitionsWithBooleanField() throws Exc
"ready for cleaning", compacts.get(0).getState());
}
+ @Test
+ public void secondCompactionShouldBeRefusedBeforeEnqueueing() throws
Exception {
+ conf.setBoolVar(HiveConf.ConfVars.COMPACTOR_CRUD_QUERY_BASED, true);
+ // Set delta numbuer threshold to 2 to avoid skipping compaction because
of too few deltas
+ conf.setIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD, 2);
+ // Set delta percentage to a high value to suppress selecting major
compression based on that
+ conf.setFloatVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD,
1000f);
Review Comment:
These 2 settings aren't necessary since the Initiator uses these thresholds,
but in the test we always queue compaction manually
##########
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/CompactorTest.java:
##########
@@ -38,6 +38,7 @@
import org.apache.hadoop.hive.metastore.api.AllocateTableWriteIdsResponse;
import org.apache.hadoop.hive.metastore.api.CommitTxnRequest;
import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionResponse;
Review Comment:
Nit: Unused import
##########
ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java:
##########
@@ -235,18 +235,18 @@ private void loadData(boolean isVectorized) throws
Exception {
runStatementOnDriver("export table Tstage to '" + getWarehouseDir()
+"/2'");
runStatementOnDriver("load data inpath '" + getWarehouseDir() + "/2/data'
overwrite into table T");
String[][] expected3 = new String[][] {
- {"{\"writeid\":5,\"bucketid\":536870912,\"rowid\":0}\t5\t6",
"t/base_0000005/000000_0"},
Review Comment:
Just trying to understand – Why was the writeid 5 originally? And if there
was no compaction in the meantime, why is it 4 now?
##########
ql/src/test/queries/clientpositive/acid_insert_overwrite_update.q:
##########
@@ -26,7 +26,6 @@ insert overwrite table sequential_update
values(current_timestamp, 0, current_ti
delete from sequential_update where seq=2;
select distinct IF(seq==0, 'LOOKS OKAY', 'BROKEN'),
regexp_extract(INPUT__FILE__NAME, '.*/(.*)/[^/]*', 1) from sequential_update;
-alter table sequential_update compact 'major';
Review Comment:
Why change this?
Issue Time Tracking
-------------------
Worklog Id: (was: 760707)
Time Spent: 20m (was: 10m)
> Worker shouldn't inject duplicate entries in `ready for cleaning` state into
> the compaction queue
> -------------------------------------------------------------------------------------------------
>
> Key: HIVE-26107
> URL: https://issues.apache.org/jira/browse/HIVE-26107
> Project: Hive
> Issue Type: Improvement
> Reporter: László Végh
> Assignee: László Végh
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> How to reproduce:
> 1) create an acid table and load some data ;
> 2) manually trigger the compaction for the table several times;
> 4) inspect compaction_queue: There are multiple entries in 'ready for
> cleaning' state for the same table.
>
> Expected behavior: All compaction request after the first one should be
> rejected until the table is changed again.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)