Processor API StateStore and Recovery with State Machines question.

Adam Bellemare Sun, 22 Jul 2018 17:30:01 -0700

Hi Folks

I have a quick question about a scenario that I would appreciate some
insight on. This is related to a KIP I am working on, but I wanted to break
this out into its own scenario to reach a wider audience. In this scenario,
I am using builder.internalTopologyBuilder to create the following within
the internals of Kafka Streaming:


1) Internal Topic Source (builder.internalTopologyBuilder.addSource(...) )

2) ProcessorSupplier with StateStore, Changelogging enabled. For the
purpose of this question, this processor is a very simple state machine.
All it does is alternately block each other event, of a given key, from
processing. For instance:
(A,1)
(A,2)
(A,3)
It would block the propagation of (A,2). The state of the system after
processing each event is:
blockNext = true
blockNext = false
blockNext = true

The expecation is that this component would always block the same event, in
any failure mode and subsequent recovery (ie: ALWAYS blocks (A,2), but not
(A,1) or (A,3) ). In other words, it would maintain perfect state in
accordance with the offsets of the upstream and downstream elements.

3) The third component is a KTable with a Materialized StateStore where I
want to sink the remaining events. It is also backed by a change log. The
events arriving would be:
(A,1)
(A,3)

The components are ordered as:
1 -> 2 -> 3


Note that I am keeping the state machine in a separate state store. My main
questions are:

1) Will this workflow be consistent in all manners of failure? For example,
are the state stores change logs fully written to internal topics before
the offset is updated for the consumer in #1?

2) Is it possible that one State Store with changelogging will be logged to
Kafka safely (say component #3) but the other (#2) will not be, prior to a
sudden, hard termination of the node?

3) Is the alternate possible, where #2 is backed up to its Kafka Topic but
#3 is not? Does the ordering of the topology matter in this case?

4) Is it possible that the state store #2 is updated and logged, but the
source topic (#1) offset is not updated?

In all of these cases, my main concern is keeping the state and the
expected output consistent. For any failure mode, will I be able to recover
to a fully consistent state given the requirements of the state machine in
#2?

Though this is a trivial example, I am not certain about the dynamics
between maintaining state, recovering from internal changelog topics, and
the order in which all of these things apply. Any words of wisdom or
explanations would be helpful here. I have been looking through the code
but I wanted to get second opinions on this.



Thanks,

Adam

Processor API StateStore and Recovery with State Machines question.

Reply via email to