deniskuzZ commented on a change in pull request #2579:
URL: https://github.com/apache/hive/pull/2579#discussion_r686955507
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
##########
@@ -234,12 +234,24 @@ void run(HiveConf conf, String jobName, Table t,
Partition p, StorageDescriptor
"especially if this message repeats. Check that compaction is running
properly. Check for any " +
"runaway/mis-configured process writing to ACID tables, especially
using Streaming Ingest API.");
int numMinorCompactions = parsedDeltas.size() / maxDeltasToHandle;
+ parsedDeltas.sort(AcidUtils.ParsedDeltaLight::compareTo);
+
+ int start = 0;
+ int end = maxDeltasToHandle;
+
for (int jobSubId = 0; jobSubId < numMinorCompactions; jobSubId++) {
+ while (parsedDeltas.get(end).getMinWriteId() == parsedDeltas.get(end -
1).getMinWriteId() &&
+ parsedDeltas.get(end).getMaxWriteId() == parsedDeltas.get(end -
1).getMaxWriteId()) {
Review comment:
it removes deltas for statements that went of the batch size. ( [...
delta_5_5_1 ] [delta_5_5_2, ...]) - we should process multiple statements for
the same writeIds range in the same batch, in this scenario delta_5_5_1 should
be included in the next batch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]