[GitHub] [hive] deniskuzZ commented on a change in pull request #2579: HIVE-25441: Incorrect deltas split for sub-compactions when using

GitBox Wed, 11 Aug 2021 08:44:44 -0700


deniskuzZ commented on a change in pull request #2579:
URL: https://github.com/apache/hive/pull/2579#discussion_r686955507




##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
##########
@@ -234,12 +234,24 @@ void run(HiveConf conf, String jobName, Table t, 
Partition p, StorageDescriptor
         "especially if this message repeats.  Check that compaction is running 
properly.  Check for any " +
         "runaway/mis-configured process writing to ACID tables, especially 
using Streaming Ingest API.");
       int numMinorCompactions = parsedDeltas.size() / maxDeltasToHandle;
+      parsedDeltas.sort(AcidUtils.ParsedDeltaLight::compareTo);
+
+      int start = 0;
+      int end = maxDeltasToHandle;
+
       for (int jobSubId = 0; jobSubId < numMinorCompactions; jobSubId++) {
+        while (parsedDeltas.get(end).getMinWriteId() == parsedDeltas.get(end - 
1).getMinWriteId() &&
+          parsedDeltas.get(end).getMaxWriteId() == parsedDeltas.get(end - 
1).getMaxWriteId()) {

Review comment:
       it removes deltas for statements that went of the batch size. ( [... 
delta_5_5_1 ] [delta_5_5_2, ...])  - we should process multiple statements for 
the same writeIds range in the same batch, in this scenario delta_5_5_1 should 
be included in the next batch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] deniskuzZ commented on a change in pull request #2579: HIVE-25441: Incorrect deltas split for sub-compactions when using

Reply via email to