jon-wei commented on a change in pull request #6724: Fix issue that tasks 
failed because of no sink for identifier
URL: https://github.com/apache/incubator-druid/pull/6724#discussion_r244906073
 
 

 ##########
 File path: 
server/src/main/java/org/apache/druid/segment/realtime/appenderator/AppenderatorImpl.java
 ##########
 @@ -485,8 +486,12 @@ public void clear() throws InterruptedException
     final List<Pair<FireHydrant, SegmentIdentifier>> indexesToPersist = new 
ArrayList<>();
     int numPersistedRows = 0;
     long bytesPersisted = 0L;
-    for (SegmentIdentifier identifier : sinks.keySet()) {
-      final Sink sink = sinks.get(identifier);
+    Iterator<Map.Entry<SegmentIdentifier, Sink>> iterator = 
sinks.entrySet().iterator();
+
+    while (iterator.hasNext()) {
 
 Review comment:
   @QiuMM 
   
   > I have never observed any exceptions caused by this. And I think there is 
no need to worry about it because the program will wait for any outstanding 
pushes to finish, then abandon the segment inside the persist thread:
   
   I believe you're right that `push` is okay.
   
   In the `persistAll` case I'm guessing what's happening is something like:
   1. Task does an incremental publish for some sequence
   2. Publish completes, `driver.registerHandoff` adds a callback after handoff 
finishes that calls `appenderator.drop` -> `abandonSegment`
   3. Task does another incremental publish for another sequence
   4. Task calls `persistAll` which iterates through sinks, but during this 
iteration, the handoff callback from the previous incremental publish finishes 
and removes a sink
   
   For `push`, the input Sinks are read from the Sequence->Segments map instead 
of looking at the entire set of Sinks, and `StreamAppenderatorDriver.publish` 
removes the sequence's segments after the publish, so on the next incremental 
publish, it doesn't have the same problem of accessing Sinks that were 
handled/dropped as part of the previous incremental publish
   
   Does that sound correct given what you're seeing?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to