jerchung opened a new issue #9486: Constant pauses and resumes for Kafka Indexing Service Tasks on empty topics when intermediateHandoffPeriod is configured URL: https://github.com/apache/druid/issues/9486 For Kafka Supervisor Specs that are configured with an `intermediateHandoffPeriod`, there is the possibility of the tasks constantly getting paused and resumed in the event that no events are received within the assigned partitions of the task within the `intermediateHandoffPeriod` ### Affected Version 0.17.0 ### Description The run loop of the `SeekableStreamIndexTaskRunner` has a check to publish a checkpoint for the assigned metadata at an interval of `nextCheckpointTime`. https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L766 This `nextCheckpointTime` is set by the [`resetNextCheckpointTime` method](https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L1759) at the [initialization of the of the task](https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L272), and [every time that the task is resumed](https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L1707). However, in the event that the latest offsets in the assigned partitions match the start offsets of the task (i.e. when the assigned partitions do not receive any events), the task is resumed but `resetNextCheckpointTime` method is never called. This means that the `nextCheckpointTime` stays as the time called at initialization, and the [checkpoint interval check](https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L765) will constantly pass, causing the task to pause and resume itself for checkpointing over and over again. I would imagine a naive fix would be to ensure that even in the event that the endOffsets are the same as the starting offsets, that the checkpoint time is still moved forward, but I'm not familiar enough with the code to understand the ramifications of such a change.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
