Re: PIP-31: Add support for transactional messaging

Sijie Guo Thu, 14 Mar 2019 20:02:05 -0700

Any other more comments on this topic?

- Sijie


On Sun, Mar 10, 2019 at 8:57 PM Jia Zhai <zhaiji...@gmail.com> wrote:

> Thanks @Sijie for the PIP.
> It has with enough details for me, It looks great, especially for the
> sidecar
> approach. Left some comments.
>
> Best Regards.
>
>
> Jia Zhai
>
> Beijing, China
>
> Mobile: +86 15810491983
>
>
>
>
> On Fri, Mar 8, 2019 at 9:58 PM Sijie Guo <guosi...@gmail.com> wrote:
>
> > Hi Team,
> >
> > I have written down all my thoughts around supporting transactional
> > streaming at Pulsar.
> >
> >
> >
> https://docs.google.com/document/d/145VYp09JKTw9jAT-7yNyFU255FptB2_B2Fye100ZXDI/edit#heading=h.bm5ainqxosrx
> >
> > Please take a look and feel free to comment on the google doc. We can
> start
> > from there.
> >
> > Also apologies first if there are in-consistency or typos or language
> > errors in the doc. feel free to fix them.
> >
> > Thanks,
> > Sijie
> >
> > On Tue, Mar 5, 2019 at 1:49 PM Sijie Guo <guosi...@gmail.com> wrote:
> >
> > > Will send the detailed proposal. We can go from there.
> > >
> > > One interesting question I would like to reply here.
> > >
> > > > But this is more microbatching than streaming.
> > >
> > > I think people usually have a wrong impression about "microbatching" vs
> > > "streaming".
> > > The "microbatching" vs "streaming" are usually found in the context
> > > talking about spark streaming vs storm/flink.
> > > The context is more about how computing engine "scheduling"
> computations.
> > >
> > > In reality, "batching" (microbatching) is almost everywhere in a
> > > "streaming" pipeline. e.g. even in pulsar client, bookie journal.
> > > In the streaming world, you will still do "microbatching" for many
> > reasons
> > > (such as throughput, windowing semantics and such).
> > > but the "microbatching" here is not about "scheduling" anymore.
> > >
> > > - Sijie
> > >
> > > On Tue, Mar 5, 2019 at 4:20 AM Ivan Kelly <iv...@apache.org> wrote:
> > >
> > >> > > My replies inline assume the above, so if you have a different
> view
> > of
> > >> > > the general shape let me know.
> > >> > >
> > >> >
> > >> > Yes. We are on the same view of the general shape. I will write down
> > the
> > >> > details of my proposal and will share it with the community
> tomorrow.
> > >>
> > >> Please do. I think there's a lot of details missing.
> > >>
> > >> > Diagrams can easily drawn to compare the differences here. I will
> > >> > incorporate into the proposal and show the differences.
> > >>
> > >> Diagrams and detailed design would be very useful. I still see a bunch
> > >> of unknowns in your design.
> > >>
> > >> > I don't think it is a hard dependency. All these components should
> be
> > >> done
> > >> > by interfaces.
> > >>
> > >> The architecture you're proposing requires a metadata service that can
> > >> scale horizontally to be able to scale, so it is a hard dependency. An
> > >> implementation backed by ZK would be only a toy.
> > >>
> > >> > I think I know why do you think interleaving is okay now. In your
> > mind,
> > >> > transactions are carrying one message per partition.
> > >>
> > >> Well yes, or only a few. I think we have a very different view of how
> > >> transactions will be used. You seem to be favouring few large
> > >> transactions, where as what Matteo and I have discussed is many
> > >> smaller transactions, and this informs both designs. With large
> > >> transactions, you're basically micro batching, and you can afford to
> > >> make the individual transactions more expensive since you have fewer.
> > >> For many smaller transactions, we need to make the transaction itself
> > >> as cheap as possible.
> > >>
> > >> > A common case of transaction in streaming, is read-transfer-write:
> > read
> > >> a
> > >> > batch of messages, process them and write the results to pulsar and
> > >> > acknowledges the messages. If you are doing this in a 100ms window,
> > the
> > >> > data can still be large enough, especially the results can be
> multiple
> > >> > times of the input messages. With that being said, at high
> throughput
> > >> > transactional streaming, data can be large per transaction,
> > continuously
> > >> > storing entries of same transaction will have a huge benefit.
> > >>
> > >> Ok, sure, I'll give you that. If you are reusing a single transaction
> > >> for a lot of inputs there may be a lot of output messages on the same
> > >> topic. But this is more microbatching than streaming.
> > >>
> > >> > Another large category of use cases should be considered is "batch"
> > data
> > >> > processing. You definitely don't want your future "batch"-ish data
> > >> > processing workload to jump back-and-forth in ledgers. In that way,
> > >> entries
> > >> > of same transaction in same partition stored continuously will huge
> > >> > benefits.
> > >>
> > >> Why would you use transaction here? This is more of a usecase for
> > >> idempotent producer.
> > >>
> > >> > Transaction coordinators are responsible for committing, aborting
> and
> > >> > cleaning up transactions.
> > >>
> > >> How, not where. I'm my experience, the trickiest part in transaction
> > >> systems is the cleanup.
> > >>
> > >> > > I think having a ledger, or a shadow topic, per topic would work
> > fine.
> > >> > > There's no need for indexing. We can already look up  an message
> by
> > >> > > messageid. This message id should be part of the transaction
> commit
> > >> > > message, and subsequently part of the commit marker written to
> each
> > >> > > topic involved in the transaction.
> > >> >
> > >> >
> > >> > The assumption you have here is all the message ids can be stored
> into
> > >> > within one commit message or the commit markers.
> > >>
> > >> Are you expecting to have more than 100k messages per transaction?
> > >>
> > >> > > Caching is no different to any
> > >> > > other choice of storage for the message data.
> > >> >
> > >> > Caching behavior is very different from normal streaming case. Since
> > >> > accessing those entries of same transaction will be jump
> > back-and-forth
> > >> in
> > >> > the shadow topic.
> > >>
> > >> Why would you be jumping back and forth for the same transaction
> > >> within the same topic? You jump to the first message from that topic
> > >> in the transaction and read forward.
> > >>
> > >> > How are you going to delete data of aborted transactions in any form
> > of
> > >> > interleaving storage?
> > >> > If we don't compact, those data of aborted transactions will be left
> > >> there
> > >> > forever.
> > >>
> > >> Aborted transactions should be very rare. The only conflict should be
> > >> on acknowledge, and these should only conflict if multiple consumers
> > >> get the same message, which shouldn't be the case. In any case, they
> > >> should be naturally cleaned up with topic itself (see below).
> > >>
> > >> > If you don't rewrite shadow topic to the main topic, you either have
> > to
> > >> do
> > >> > compaction or retain the shadow topic forever, no?
> > >>
> > >> No. Lets say that the maximum age of a single transaction is 1h. That
> > >> means you only need to retain an hour more of data more in the shadow
> > >> topic as you would have to retain in the main topic. Of course we
> > >> wouldn't use wallclock time for this, but something around the low
> > >> watermark or something, but that's the basic idea. I haven't worked
> > >> out all the details.
> > >>
> > >> I look forward to seeing your full design.
> > >>
> > >> -Ivan
> > >>
> > >
> >
>

Re: PIP-31: Add support for transactional messaging

Reply via email to