[ 
https://issues.apache.org/jira/browse/KAFKA-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053824#comment-17053824
 ] 

ASF GitHub Bot commented on KAFKA-6145:
---------------------------------------

ableegoldman commented on pull request #8246: KAFKA-6145: Pt 2. Include offset 
sums in subscription
URL: https://github.com/apache/kafka/pull/8246
 
 
   KIP-441 Pt. 2: Compute sum of offsets across all stores/changelogs in a task 
and include them in the subscription.
   
   Previously each thread would just encode every task on disk, but we now need 
to read the changelog file which is unsafe to do without a lock on the task 
directory. So, each thread now encodes only its assigned active and standby 
tasks, and ignores any already-locked tasks.
   
   In some cases there may be unowned and unlocked tasks on disk that were 
reassigned to another instance and haven't been cleaned up yet by the 
background thread. Each StreamThread makes a weak effort to lock any such task 
directories it finds, and if successful is then responsible for computing and 
reporting that task's offset sum (based on reading the checkpoint file)
   
   This PR therefore also addresses two orthogonal issues:
   1) Prevent background cleaner thread from deleting unowned stores during a 
rebalance
   2) Deduplicate standby tasks in subscription: each thread used to include 
every (non-active) task found on disk in its "standby task" set, which meant 
every active, standby, and unowned task was encoded by _every_ thread.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Warm up new KS instances before migrating tasks - potentially a two phase 
> rebalance
> -----------------------------------------------------------------------------------
>
>                 Key: KAFKA-6145
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6145
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Antony Stubbs
>            Priority: Major
>              Labels: needs-kip
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal breaker 
> for micro service use cases.
> One workaround would be two execute the rebalance in two phases:
> 1) start running state store building on the new node
> 2) once the state store is fully populated on the new node, only then 
> rebalance the tasks - there will still be a rebalance pause, but would be 
> greatly reduced
> Relates to: KAFKA-6144 - Allow state stores to serve stale reads during 
> rebalance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to