Re: [PR] Prevent multiple attempts to publish segments for the same sequence (druid)

via GitHub Tue, 07 Nov 2023 19:19:51 -0800


kfaraz commented on code in PR #14995:
URL: https://github.com/apache/druid/pull/14995#discussion_r1385896849



##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java:
##########
@@ -793,13 +793,17 @@ public void onFailure(Throwable t)
         status = Status.PUBLISHING;
       }
 
+      // The callback for a successful segment publish may remove a sequence 
from the publishingSequences,
+      // which is racy and can allow the same sequence to be added to the set 
again.
+      // Create a copy of publishing sequences to which we can only add 
elements, and not remove them.
+      final Set<String> publishingSequencesSnapshot = new 
HashSet<>(publishingSequences);

Review Comment:
   I think the new code will be simpler if we snapshot both `sequences` and 
`publishingSequences` and then create a third set which contains 
(`sequencesSnapshot` - `publishingSequencesSnapshot`) and iterate over that 
instead of `sequencesSnapshot`.
   
   Also, I wonder if there can still be duplicates as it really depends on the 
timing of snapshotting `publishingSequences`. Would a better solution be to 
track `publishedSequences`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Prevent multiple attempts to publish segments for the same sequence (druid)

Reply via email to