jackjlli commented on a change in pull request #8242:
URL: https://github.com/apache/pinot/pull/8242#discussion_r816176767
##########
File path:
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
##########
@@ -2963,6 +2961,25 @@ public String startReplaceSegments(String
tableNameWithType, List<String> segmen
return segmentLineageEntryId;
}
+ // TODO: Add more conflict checks over segmentsTo later. For example, for
APPEND table,
+ // if the new segments from 2 batch jobs are overlapping, we reject
one of job.
+ private static boolean isConflicted(List<String> segmentsFrom, LineageEntry
lineageEntry, TableConfig tableConfig) {
+ if (!segmentsFrom.isEmpty()) {
+ // It's conflicted if there is any overlap between segmentsFrom.
+ return !Collections.disjoint(segmentsFrom,
lineageEntry.getSegmentsFrom());
+ }
+ // For REFRESH table, it's conflicted if both segmentsFrom are empty.
+ if (isRefreshTable(tableConfig)) {
Review comment:
I can elaborate on the reason why multiple segment ingestion tasks
cannot be done in parallel for a refresh table. That's mainly due to how the
segment name is generated.
For refresh table, since there is no time column specified, there will be no
min/max value shown in the segment name; the generated segment name will be
like `testTable_postfix_0`. The suffix `_0` only denotes the identifier for the
segment within a batch ingestion job.
And since there is no relationship between the input raw file names and
output segment names, there is no way to tell which raw file maps to which
output segment within a batch. Thus, there is no way to backfill only a subset
of segments for refresh table, i.e. if we want to update some data in one
segment, we need to replace the whole data of it, because we don't know which
segment we actually need to replace with the new data.
That's why if there are two ingestion tasks running in parallel for the same
refresh table, either one of them should be banned (either the latter push job
which detects the same segment names from `segmentFrom` field if forceCleanup =
false, or the former push job with forceCleanup = true so that its lineage
state will be marked as `reverted`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]