gianm commented on a change in pull request #7246: Fix record validation in 
SeekableStreamIndexTaskRunner
URL: https://github.com/apache/incubator-druid/pull/7246#discussion_r265159511
 
 

 ##########
 File path: 
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java
 ##########
 @@ -1882,33 +1882,35 @@ TransactionalSegmentPublisher 
createPublisher(TaskToolbox toolbox, boolean useTr
   }
 
   private boolean verifyInitialRecordAndSkipExclusivePartition(
-      final OrderedPartitionableRecord<PartitionIdType, SequenceOffsetType> 
record,
-      final Map<PartitionIdType, SequenceOffsetType> intialSequenceSnapshot
+      final OrderedPartitionableRecord<PartitionIdType, SequenceOffsetType> 
record
   )
   {
-    if (intialSequenceSnapshot.containsKey(record.getPartitionId())) {
-      if 
(record.getSequenceNumber().compareTo(intialSequenceSnapshot.get(record.getPartitionId()))
 < 0) {
 
 Review comment:
   > Here, if finish = false in setEndOffsets(), intialSequenceSnapshot was 
updated to the given end offsets which is the start offsets of the next 
sequence, S'. However, each replica can still consume some more offsets of the 
sequence S after being resumed until it reaches to the end offsets of S. This 
incurred an exception at here because the offset of the record is for the 
sequence S which should be smaller than start offsets of S'.
   
   It sounds like this part is the heart of the bug: the code didn't allow for 
continuing to read a few more messages of a prior sequence `S` before the 
messages for a new sequence `S'` started showing up. And it sounds like the fix 
is to compare against the `currOffsets` we think we should be reading right 
now, rather than the start of the sequence. Thanks for explaining.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to