[
https://issues.apache.org/jira/browse/KAFKA-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053824#comment-17053824
]
ASF GitHub Bot commented on KAFKA-6145:
---------------------------------------
ableegoldman commented on pull request #8246: KAFKA-6145: Pt 2. Include offset
sums in subscription
URL: https://github.com/apache/kafka/pull/8246
KIP-441 Pt. 2: Compute sum of offsets across all stores/changelogs in a task
and include them in the subscription.
Previously each thread would just encode every task on disk, but we now need
to read the changelog file which is unsafe to do without a lock on the task
directory. So, each thread now encodes only its assigned active and standby
tasks, and ignores any already-locked tasks.
In some cases there may be unowned and unlocked tasks on disk that were
reassigned to another instance and haven't been cleaned up yet by the
background thread. Each StreamThread makes a weak effort to lock any such task
directories it finds, and if successful is then responsible for computing and
reporting that task's offset sum (based on reading the checkpoint file)
This PR therefore also addresses two orthogonal issues:
1) Prevent background cleaner thread from deleting unowned stores during a
rebalance
2) Deduplicate standby tasks in subscription: each thread used to include
every (non-active) task found on disk in its "standby task" set, which meant
every active, standby, and unowned task was encoded by _every_ thread.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Warm up new KS instances before migrating tasks - potentially a two phase
> rebalance
> -----------------------------------------------------------------------------------
>
> Key: KAFKA-6145
> URL: https://issues.apache.org/jira/browse/KAFKA-6145
> Project: Kafka
> Issue Type: New Feature
> Components: streams
> Reporter: Antony Stubbs
> Priority: Major
> Labels: needs-kip
>
> Currently when expanding the KS cluster, the new node's partitions will be
> unavailable during the rebalance, which for large states can take a very long
> time, or for small state stores even more than a few ms can be a deal breaker
> for micro service use cases.
> One workaround would be two execute the rebalance in two phases:
> 1) start running state store building on the new node
> 2) once the state store is fully populated on the new node, only then
> rebalance the tasks - there will still be a rebalance pause, but would be
> greatly reduced
> Relates to: KAFKA-6144 - Allow state stores to serve stale reads during
> rebalance
--
This message was sent by Atlassian Jira
(v8.3.4#803005)