jihoonson commented on a change in pull request #7246: Fix record validation in
SeekableStreamIndexTaskRunner
URL: https://github.com/apache/incubator-druid/pull/7246#discussion_r264981043
##########
File path:
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java
##########
@@ -1882,33 +1882,35 @@ TransactionalSegmentPublisher
createPublisher(TaskToolbox toolbox, boolean useTr
}
private boolean verifyInitialRecordAndSkipExclusivePartition(
- final OrderedPartitionableRecord<PartitionIdType, SequenceOffsetType>
record,
- final Map<PartitionIdType, SequenceOffsetType> intialSequenceSnapshot
+ final OrderedPartitionableRecord<PartitionIdType, SequenceOffsetType>
record
)
{
- if (intialSequenceSnapshot.containsKey(record.getPartitionId())) {
- if
(record.getSequenceNumber().compareTo(intialSequenceSnapshot.get(record.getPartitionId()))
< 0) {
Review comment:
I think checking against `intialSequenceSnapshot` is wrong. Before this PR,
`intialSequenceSnapshot` contained the start offsets of the current sequence.
Comparing the offsets of the read record with `intialSequenceSnapshot` means
that it would allow rewinding if the rewound offsets are still larger than
`intialSequenceSnapshot` which I don't think it should be allowed.
The bug reported in #7239 happens while checkpointing with multiple
replicas. During the checkpoint, the supervisor pauses all replica tasks and
finds the max offsets of the current sequence, `S`. And then, it sets the max
offsets to end offsets for all replicas. Here, if `finish = false` in
`setEndOffsets()`, [`intialSequenceSnapshot` was updated to the given end
offsets](https://github.com/apache/incubator-druid/pull/7246/files/bbe29c2beca775f5806cf841f681bb7ad637325d#diff-2512ef23844750284130758031054081L1445)
which is the start offsets of the next sequence, `S'`. However, each replica
can still consume some more offsets of the sequence `S` after being resumed
until it reaches to the end offsets of `S`. This incurred an exception at
[here](https://github.com/apache/incubator-druid/pull/7246/files/bbe29c2beca775f5806cf841f681bb7ad637325d#diff-2512ef23844750284130758031054081L1890)
because the offset of the record is for the sequence `S` which should be
smaller than start offsets of `S'`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]