jihoonson opened a new issue #6001: Segment publishing order should be preserved in kafka indexing service URL: https://github.com/apache/incubator-druid/issues/6001 In Kafka indexing service, the overlord does a sanity check that the start offsets of partitions of current publishing segments are same with the ones stored in metastore, so that it guarantees that all segments are published in order. Because of this check, some tasks might fail in this scenario. 1. The supervisor created a task (`T1`) with a start offset `O1`. 2. Somehow, the supervisor couldn't send an endOffset `O2` to `T1` in `taskDuration`. Instead, it sent an endOffset `O3` to `T1` after `taskDuration * 10`. (In our case, supervisor couldn't send because of too frequent HTTP connection refused errors.) 3. `T1` started to merge, push, and publish segments. 4. The supervisor created a new task, `T2`, with a start offset `O3`. 5. After `taskDuration`, it sent an endOffset `O4` to `T2`. 6. `T2` started to merge, push, and publish segments. 7. Since `T1` had run for a much longer time, it had much more segments to publish than `T2`. As a result, `T2` tried to publish before `T1` complete publishing. 8. `T2` failed to publish because of the sanity check when updating metastore. So, I think the supervisor should be able to guarantee segment publishing order across all running tasks like below. ``` T1: indexing ===> pushing ===> publishing ===> handoff T2: indexing ===> pushing ===> publishing ===> handoff T3: indexing ===> pushing ===> publishing ===> handoff ... ``` To do so, I suppose the supervisor should be able to send pushing signals to kafka tasks as well as publishing signals.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
