jon-wei opened a new pull request #8870: Additional Kinesis resharding fixes URL: https://github.com/apache/incubator-druid/pull/8870 This PR fixes two issues with Kinesis resharding: - When shards expired after the retention period, old tasks created under the previous shard set (with potentially different task group -> shard mappings) could incorrectly add shards to task groups when the supervisor checkpoints those tasks - When a task is assigned only closed shards and doesn't read any data from them (either the closed shard had no additional records but this condition hasn't been detected yet, or all additional records in the shard are unparseable/thrown away), the task wouldn't publish anything and thus the supervisor would not be informed that those shards have been closed and fully read. The supervisor would then repeatedly to assign these shards to tasks which continue to read and publish nothing. The first issue is fixed by: - The map that used to track task group -> shard and shard -> offset mappings has been split into two maps, so that updating the shard offsets for checkpointing will not affect the task group -> shard mappings. - The Kinesis supervisor has been adjusted so that the `partitionIds` list can only change in `updatePartitionDataFromStream()` - When a shard is detected as closed in `updatePartitionDataFromStream()`, it will be immediately removed from the `partitionIds` list and no longer assigned to future tasks - In `discoverTasks()`, the Kinesis supervisor will immediately terminate any currently running tasks that have shards known to be closed or expired. - The general idea behind the changes above is to remove the unnecessary link between group -> shard and shard -> offset mappings and then allow shard removals to be handled similarly to taskCount partition reassignments The second issue is fixed by: - When the Kinesis task publishes nothing, it will check if it has hit EOS on any shards assigned to it. If so, it will send a new task action to the supervisor informing it of these closed shards, which will then be updated in the metadata store. This PR has: - [x] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/incubator-druid/blob/master/dev/code-review/concurrency.md) - [x] added documentation for new or modified features or behaviors. - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [x] added unit tests or modified existing tests to cover new code paths. - [ ] added integration tests. - [x] been tested in a test Druid cluster.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
