Adam, I don’t have an answer for you but I would also be interested in clarification of this process if anyone can provide more details. If your reading is correct I would welcome the KIP to reduce the scaling pauses.
Cheers, Rhys McCaig > On Feb 6, 2019, at 7:44 AM, Adam Bellemare <[email protected]> wrote: > > Bump - hoping someone has some insight. Alternately, redirection to a more > suitable forum. > > Thanks > > On Sun, Feb 3, 2019 at 10:25 AM Adam Bellemare <[email protected]> > wrote: > >> Hey Folks >> >> I have a few questions around the operations of stateful processing while >> scaling nodes up/down, and a possible KIP in question #4. Most of them have >> to do with task processing during rebuilding of state stores after scaling >> nodes up. >> >> Scenario: >> Single node/thread, processing 2 topics (10 partitions each): >> User event topic (events) - ie: key:userId, value: ProductId >> Product topic (entity) - ie: key: ProductId, value: productData >> >> My topology looks like this: >> >> KTable productTable = ... //materialize from product topic >> >> KStream output = userStream >> .map(x => (x.value, x.key) ) //Swap the key and value around >> .join(productTable, ... ) //Joiner is not relevant here >> .to(...) //Send it to some output topic >> >> >> Here are my questions: >> 1) If I scale the processing node count up, partitions will be rebalanced >> to the new node. Does processing continue as normal on the original node, >> while the new node's processing is paused as the internal state stores are >> rebuilt/reloaded? From my reading of the code (and own experience) I >> believe this to be the case, but I am just curious in case I missed >> something. >> >> 2) What happens to the userStream map task? Will the new node be able to >> process this task while the state store is rebuilding/reloading? My reading >> of the code suggests that this map process will be paused on the new node >> while the state store is rebuilt. The effect of this is that it will lead >> to a delay in events reaching the original node's partitions, which will be >> seen as late-arriving events. Am I right in this assessment? >> >> 3) How does scaling up work with standby state-store replicas? From my >> reading of the code, it appears that scaling a node up will result in a >> reabalance, with the state assigned to the new node being rebuilt first >> (leading to a pause in processing). Following this, the standy replicas are >> populated. Am I correct in this reading? >> >> 4) If my reading in #3 is correct, would it be possible to pre-populate >> the standby stores on scale-up before initiating active-task transfer? This >> would allow seamless scale-up and scale-down without requiring any pauses >> for rebuilding state. I am interested in kicking this off as a KIP if so, >> but would appreciate any JIRAs or related KIPs to read up on prior to >> digging into this. >> >> >> Thanks >> >> Adam Bellemare >>
