jihoonson opened a new issue #6001: Segment publishing order should be 
preserved in kafka indexing service
URL: https://github.com/apache/incubator-druid/issues/6001
 
 
   In Kafka indexing service, the overlord does a sanity check that the start 
offsets of partitions of current publishing segments are same with the ones 
stored in metastore, so that it guarantees that all segments are published in 
order. Because of this check, some tasks might fail in this scenario.
   
   1. The supervisor created a task (`T1`) with a start offset `O1`.
   2. Somehow, the supervisor couldn't send an endOffset `O2` to `T1` in 
`taskDuration`. Instead, it sent an endOffset `O3` to `T1` after `taskDuration 
* 10`. (In our case, supervisor couldn't send because of too frequent HTTP 
connection refused errors.)
   3. `T1` started to merge, push, and publish segments.
   4. The supervisor created a new task, `T2`, with a start offset `O3`.
   5. After `taskDuration`, it sent an endOffset `O4` to `T2`. 
   6. `T2` started to merge, push, and publish segments.
   7. Since `T1` had run for a much longer time, it had much more segments to 
publish than `T2`. As a result, `T2` tried to publish before `T1` complete 
publishing.
   8. `T2` failed to publish because of the sanity check when updating 
metastore.
   
   So, I think the supervisor should be able to guarantee segment publishing 
order across all running tasks like below.
   
   ```
   T1: indexing ===> pushing ===> publishing ===> handoff
                 T2: indexing ===> pushing ===> publishing ===> handoff
                               T3: indexing ===> pushing ===> publishing ===> 
handoff
                                             ...
   ```
   
   To do so, I suppose the supervisor should be able to send pushing signals to 
kafka tasks as well as publishing signals.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to