Re: MirrorMaker 2.0 and Streams interplay (topic naming control)

Paul Whalen Mon, 25 Mar 2019 19:40:51 -0700

John and Ryanne,

Thanks for the responses! I think Ryanne's way of describing the question
is actually a much better summary than my long winded description: "a
Streams app can switch between topics with and without a cluster alias
prefix when you migrate between prod and pre-prod, while preserving state."


To address a few of John's points...

But, the prod app will still be running, and its changelog will still be
> mirrored into pre-prod when you start the pre-prod app.
>
The idea is actually to turn off the mirroring from prod to pre-prod during
this period, so the environments can operate completely independently and
their state can comfortably diverge during the testing period.  After the
testing period we'd be happy to throw away everything in pre-prod and start
mirroring again from prod with a blank slate.

Also, the pre-prod app won't be in the same consumer group as the prod app,
> so it won't know from what offset to start processing input.
>
This is where I'm hoping the magic of MM2 will come in - at the time we
shut off mirroring from prod to pre-prod in order to spin of the pre-prod
environment, we will do an "offset translation" with RemoteClusterUtils
like Ryanne mentioned, so new Streams apps in pre-prod will see consumer
offsets that make sense for reading from pre-prod topics.

I like both of your ideas around the "user space" solution: subscribing to
multiple topics, or choosing a topic based on config.  However, in order to
populate their internal state properly, when the pre-prod apps come up they
will need to look for repartition and changelog topics with the right
prefix.  This seems problematic to me since the user doesn't have direct
control over those topic names, though it did just occur to me now that the
user *sort of* does.  Since the naming scheme is currently just
applicationId + "-" + storeName + "-changelog", we could translate the
consumer group offsets to a consumer group with a new name that has the
same prefix as the mirrored topics do.  That seems a bit clumsly/lucky to
me (is the internal topic naming convention really a "public API"?), but I
think it would work.

I'd be curious to hear if folks think that solution would work and be an
acceptable pattern, since my original proposal of more user control of
internal topic naming did seem a bit heavy handed.

Thanks very much for your help!
Paul

On Mon, Mar 25, 2019 at 3:14 PM Ryanne Dolan <ryannedo...@gmail.com> wrote:

> Hey Paul, thanks for the kind words re MM2.
>
> I'm not a Streams expert first off, but I think I understand your question:
> if a Streams app can switch between topics with and without a cluster alias
> prefix when you migrate between prod and pre-prod, while preserving state.
> Streams supports regexes and lists of topics as input, so you can use e.g.
> builder.stream(List.of("topic1", "prod.topic1")), which is a good place to
> start. In this case, the combined subscription is still a single stream,
> conceptually, but comprises partitions from both topics, i.e. partitions
> from topic1 plus partitions from prod.topic1. At a high level, this is no
> different than adding more partitions to a single topic. I think any
> intermediate or downstream topics/tables would remain unchanged, since they
> are still the result of this single stream.
>
> The trick is to correctly translate offsets for the input topics when
> migrating the app between prod and pre-prod, which RemoteClusterUtils can
> help with. You could do this with external tooling, e.g. a script
> leveraging RemoteClusterUtils and kafka-streams-application-reset.sh. I
> haven't tried this with a Streams app myself, but I suspect it would work.
>
> Ryanne
>
>
> On Sun, Mar 24, 2019 at 12:31 PM Paul Whalen <pgwha...@gmail.com> wrote:
>
> > Hi all,
> >
> > With MirrorMaker 2.0 (
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > )
> > accepted and coming along very nicely in development, it has got me
> > wondering if a certain use case is supported, and if not, can changes be
> > made to Streams or MM2 to support it.  I'll explain the use case, but the
> > TL;DR here is "do we need more control over topic naming in MM2 or
> > Streams?"
> >
> > My team foresees using MM2 as a way to mirror data from our prod
> > environment to a pre-prod environment.  The data is supplied by external
> > vendors, introduced into our system through a Kafka Streams ETL pipeline,
> > and consumed by our end-applications.  Generally we would only like to
> run
> > the ETL pipeline in prod since there is an operational cost to running it
> > in both prod and pre-prod (the data sometimes needs manual attention).
> > This seems to fit MM2 well: pre-prod end-applications consume from the
> > pre-prod Kafka cluster, which is entirely "remote" topics being mirrored
> > from the prod cluster.  We only have to keep one instance of the ETL
> > pipeline running, but end-applications can be separate, connecting to
> their
> > respective prod and pre-prod Kafka clusters.
> >
> > However, when we want to test changes to the ETL pipeline itself, we
> would
> > like to turn off the mirroring from prod to pre-prod, and run the ETL
> > pipeline also in pre-prod, picking up the most recent state of the prod
> > pipeline from when mirroring was turned off (FWIW, downtime is not an
> issue
> > for our use case).
> >
> > My question/concern is basically, can Streams apps work when they're
> > running against topics prepended with a cluster alias, like
> > "pre-prod.App-statestore-changelog" as is the plan with MM2. From what I
> > can tell the answer is no, and my proposal would be to give the Streams
> > user more specific control over how Streams names its internal topics
> > (repartition and changelogs) by defining an "InternalTopicNamingStrategy"
> > or similar.  Perhaps there is a solution on the MM2 side as well, but it
> > seems much less desirable to budge on that convention.
> >
> > I phrased the question in terms of my team's problem, but it's worth
> noting
> > that this use case is passably similar to a potential DR use case, where
> > there is a DR cluster that is normally just being mirrored to by MM2, but
> > in a DR scenario would become the active cluster that Streams
> applications
> > are connected to.
> >
> > Thanks for considering this issue, and great job to those working on MM2
> so
> > far!
> >
> > Paul
> >
>

Re: MirrorMaker 2.0 and Streams interplay (topic naming control)

Reply via email to