Re: Statestore restoration & scaling questions - possible KIP as well.

McCaig, Rhys Thu, 07 Feb 2019 11:19:57 -0800

Adam,

I don’t have an answer for you but I would also be interested in clarification 
of this process if anyone can provide more details. If your reading is correct 
I would welcome the KIP to reduce the scaling pauses.


Cheers,
Rhys McCaig

> On Feb 6, 2019, at 7:44 AM, Adam Bellemare <[email protected]> wrote:
> 
> Bump - hoping someone has some insight. Alternately, redirection to a more
> suitable forum.
> 
> Thanks
> 
> On Sun, Feb 3, 2019 at 10:25 AM Adam Bellemare <[email protected]>
> wrote:
> 
>> Hey Folks
>> 
>> I have a few questions around the operations of stateful processing while
>> scaling nodes up/down, and a possible KIP in question #4. Most of them have
>> to do with task processing during rebuilding of state stores after scaling
>> nodes up.
>> 
>> Scenario:
>> Single node/thread, processing 2 topics (10 partitions each):
>> User event topic (events) - ie: key:userId, value: ProductId
>> Product topic (entity) - ie: key: ProductId, value: productData
>> 
>> My topology looks like this:
>> 
>> KTable productTable = ... //materialize from product topic
>> 
>> KStream output = userStream
>>    .map(x => (x.value, x.key) ) //Swap the key and value around
>>    .join(productTable, ... ) //Joiner is not relevant here
>>    .to(...)  //Send it to some output topic
>> 
>> 
>> Here are my questions:
>> 1) If I scale the processing node count up, partitions will be rebalanced
>> to the new node. Does processing continue as normal on the original node,
>> while the new node's processing is paused as the internal state stores are
>> rebuilt/reloaded? From my reading of the code (and own experience) I
>> believe this to be the case, but I am just curious in case I missed
>> something.
>> 
>> 2) What happens to the userStream map task? Will the new node be able to
>> process this task while the state store is rebuilding/reloading? My reading
>> of the code suggests that this map process will be paused on the new node
>> while the state store is rebuilt. The effect of this is that it will lead
>> to a delay in events reaching the original node's partitions, which will be
>> seen as late-arriving events. Am I right in this assessment?
>> 
>> 3) How does scaling up work with standby state-store replicas? From my
>> reading of the code, it appears that scaling a node up will result in a
>> reabalance, with the state assigned to the new node being rebuilt first
>> (leading to a pause in processing). Following this, the standy replicas are
>> populated. Am I correct in this reading?
>> 
>> 4) If my reading in #3 is correct, would it be possible to pre-populate
>> the standby stores on scale-up before initiating active-task transfer? This
>> would allow seamless scale-up and scale-down without requiring any pauses
>> for rebuilding state. I am interested in kicking this off as a KIP if so,
>> but would appreciate any JIRAs or related KIPs to read up on prior to
>> digging into this.
>> 
>> 
>> Thanks
>> 
>> Adam Bellemare
>>

Re: Statestore restoration & scaling questions - possible KIP as well.

Reply via email to