jon-wei commented on a change in pull request #6724: Fix issue that tasks
failed because of no sink for identifier
URL: https://github.com/apache/incubator-druid/pull/6724#discussion_r244906073
##########
File path:
server/src/main/java/org/apache/druid/segment/realtime/appenderator/AppenderatorImpl.java
##########
@@ -485,8 +486,12 @@ public void clear() throws InterruptedException
final List<Pair<FireHydrant, SegmentIdentifier>> indexesToPersist = new
ArrayList<>();
int numPersistedRows = 0;
long bytesPersisted = 0L;
- for (SegmentIdentifier identifier : sinks.keySet()) {
- final Sink sink = sinks.get(identifier);
+ Iterator<Map.Entry<SegmentIdentifier, Sink>> iterator =
sinks.entrySet().iterator();
+
+ while (iterator.hasNext()) {
Review comment:
@QiuMM
> I have never observed any exceptions caused by this. And I think there is
no need to worry about it because the program will wait for any outstanding
pushes to finish, then abandon the segment inside the persist thread:
I believe you're right that `push` is okay.
In the `persistAll` case I'm guessing what's happening is something like:
1. Task does an incremental publish for some sequence
2. Publish completes, `driver.registerHandoff` adds a callback after handoff
finishes that calls `appenderator.drop` -> `abandonSegment`
3. Task does another incremental publish for another sequence
4. Task calls `persistAll` which iterates through all Sinks, but during this
iteration, the handoff callback from the previous incremental publish finishes
and removes a Sink
For `push`, the input Sinks are read from the Sequence->Segments map instead
of looking at the entire set of Sinks, and `StreamAppenderatorDriver.publish`
removes the sequence's segments after the publish, so on the next incremental
publish, it doesn't have the same problem of accessing Sinks that were
handled/dropped as part of the previous incremental publish
Does that sound correct given what you're seeing?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]