gianm commented on a change in pull request #7246: Fix record validation in
SeekableStreamIndexTaskRunner
URL: https://github.com/apache/incubator-druid/pull/7246#discussion_r265159511
##########
File path:
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java
##########
@@ -1882,33 +1882,35 @@ TransactionalSegmentPublisher
createPublisher(TaskToolbox toolbox, boolean useTr
}
private boolean verifyInitialRecordAndSkipExclusivePartition(
- final OrderedPartitionableRecord<PartitionIdType, SequenceOffsetType>
record,
- final Map<PartitionIdType, SequenceOffsetType> intialSequenceSnapshot
+ final OrderedPartitionableRecord<PartitionIdType, SequenceOffsetType>
record
)
{
- if (intialSequenceSnapshot.containsKey(record.getPartitionId())) {
- if
(record.getSequenceNumber().compareTo(intialSequenceSnapshot.get(record.getPartitionId()))
< 0) {
Review comment:
> Here, if finish = false in setEndOffsets(), intialSequenceSnapshot was
updated to the given end offsets which is the start offsets of the next
sequence, S'. However, each replica can still consume some more offsets of the
sequence S after being resumed until it reaches to the end offsets of S. This
incurred an exception at here because the offset of the record is for the
sequence S which should be smaller than start offsets of S'.
It sounds like this part is the heart of the bug: the code didn't allow for
continuing to read a few more messages of a prior sequence `S` before the
messages for a new sequence `S'` started showing up. And it sounds like the fix
is to compare against the `currOffsets` we think we should be reading right
now, rather than the start of the sequence. Thanks for explaining.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]