jerchung opened a new issue #9486: Constant pauses and resumes for Kafka 
Indexing Service Tasks on empty topics when intermediateHandoffPeriod is 
configured
URL: https://github.com/apache/druid/issues/9486
 
 
   For Kafka Supervisor Specs that are configured with an 
`intermediateHandoffPeriod`, there is the possibility of the tasks constantly 
getting paused and resumed in the event that no events are received within the 
assigned partitions of the task within the `intermediateHandoffPeriod`
   
   ### Affected Version
   
   0.17.0
   
   ### Description
   The run loop of the `SeekableStreamIndexTaskRunner` has a check to publish a 
checkpoint for the assigned metadata at an interval of `nextCheckpointTime`. 
   
   
https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L766
   
   This `nextCheckpointTime` is set by the [`resetNextCheckpointTime` 
method](https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L1759)
 at the [initialization of the of the 
task](https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L272),
 and [every time that the task is 
resumed](https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L1707).
   
   However, in the event that the latest offsets in the assigned partitions 
match the start offsets of the task (i.e. when the assigned partitions do not 
receive any events), the task is resumed but `resetNextCheckpointTime` method 
is never called. This means that the `nextCheckpointTime` stays as the time 
called at initialization, and the [checkpoint interval 
check](https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L765)
 will constantly pass, causing the task to pause and resume itself for 
checkpointing over and over again.
   
   I would imagine a naive fix would be to ensure that even in the event that 
the endOffsets are the same as the starting offsets, that the checkpoint time 
is still moved forward, but I'm not familiar enough with the code to understand 
the ramifications of such a change.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to