jon-wei opened a new pull request #8870: Additional Kinesis resharding fixes
URL: https://github.com/apache/incubator-druid/pull/8870
 
 
   This PR fixes two issues with Kinesis resharding:
   - When shards expired after the retention period, old tasks created under 
the previous shard set (with potentially different task group -> shard 
mappings) could incorrectly add shards to task groups when the supervisor 
checkpoints those tasks
   - When a task is assigned only closed shards and doesn't read any data from 
them (either the closed shard had no additional records but this condition 
hasn't been detected yet, or all additional records in the shard are 
unparseable/thrown away), the task wouldn't publish anything and thus the 
supervisor would not be informed that those shards have been closed and fully 
read. The supervisor would then repeatedly to assign these shards to tasks 
which continue to read and publish nothing.
   
   The first issue is fixed by:
   - The map that used to track task group -> shard and shard -> offset 
mappings has been split into two maps, so that updating the shard offsets for 
checkpointing will not affect the task group -> shard mappings.
   - The Kinesis supervisor has been adjusted so that the `partitionIds` list 
can only change in `updatePartitionDataFromStream()`
   - When a shard is detected as closed in `updatePartitionDataFromStream()`, 
it will be immediately removed from the `partitionIds` list and no longer 
assigned to future tasks
   - In `discoverTasks()`, the Kinesis supervisor will immediately terminate 
any currently running tasks that have shards known to be closed or expired.
   - The general idea behind the changes above is to remove the unnecessary 
link between group -> shard and shard -> offset mappings and then allow shard 
removals to be handled similarly to taskCount partition reassignments
   
   The second issue is fixed by:
   - When the Kinesis task publishes nothing, it will check if it has hit EOS 
on any shards assigned to it. If so, it will send a new task action to the 
supervisor informing it of these closed shards, which will then be updated in 
the metadata store.
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/incubator-druid/blob/master/dev/code-review/concurrency.md)
   - [x] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to