Ah yes of course, this was an oversight, I completely ignored the multiple
processors sharing the same state store when writing up the KIP.  Which is
funny, because I've actually done this (different processors sharing state
stores) a fair amount myself, and I've settled on a pattern where I group
the Processors in an enclosing class, and that enclosing class handles as
much as possible.  Here's a gist showing the rough structure, just for
context: https://gist.github.com/pgwhalen/57a00dcc2269b7610e1aaeb1549b3b65
. Note how it adds the stores to the topology, as well as providing a
public method with the store names.

I don't think my proposal completely conflicts with the multiple processors
sharing state stores use case, since you can create a supplier that
provides the store name you want, somewhat independently of your actual
Processor logic.  The issue I do see though, is that
topology.addStateStore() can only be called once for a given store.  So for
your example, if the there was a single TransformerSupplier that was passed
into both transform() calls, "store1" would be added (under the hood) to
the topology twice, which is no good.

Perhaps this suggests that one of my alternatives on the KIP might be
desirable: either not having the suppliers return StoreBuilders (just store
names), or not deprecating the old methods that take "String...
stateStoreNames". I'll have to think about it a bit.

Paul

On Sun, Dec 9, 2018 at 11:57 PM Guozhang Wang <wangg...@gmail.com> wrote:

> Hello Paul,
>
> Thanks for the great writeup (very detailed and crystal motivation
> sections!).
>
> This is quite an interesting idea and I do like the API cleanness you
> proposed. The original motivation of letting StreamsTopology to add state
> stores though, is to allow different processors to share the state store.
> For example:
>
> builder.addStore("store1");
>
> // a path of stream transformations that leads to KStream stream1.
> stream1.transform(..., "store1");
>
> // another path that generates a KStream stream2.
> stream2.transform(..., "store1");
>
> Behind the scene, Streams will make sure stream1 / stream2 transformations
> will always be grouped together as a single group of tasks, each of which
> will be executed by a single thread and hence there's no concurrency issues
> on accessing the store from different operators within the same task. I'm
> not sure how common this use case is, but I'd like to hear if you have any
> thoughts maintaining this since the current proposal seems exclude this
> possibility.
>
>
> Guozhang
>
>
> On Sun, Dec 9, 2018 at 4:18 PM Paul Whalen <pgwha...@gmail.com> wrote:
>
> > Here's KIP-401 for discussion, a minor Kafka Streams API change that I
> > think could greatly increase the usability of the low-level processor
> API.
> > I have some code written but will wait to see if there is buy in before
> > going all out and creating a pull request.  It seems like most of the
> work
> > would be in updating documentation and tests.
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97553756
> >
> > Thanks!
> > Paul
> >
>
>
> --
> -- Guozhang
>

Reply via email to