agreed. finishBundle() helps but can not guarantee consistent state.

On Tue, May 3, 2016 at 1:49 AM, Maximilian Michels <[email protected]> wrote:

> Correct, Kafka doesn't support rollbacks of the producer. In Flink
> there is the RollingSink which supports transactional rolling files.
> Admittedly, that is the only one. Still, checkpointing sinks in Beam
> could be useful for users who are concerned about exactly once
> semantics. I'm not sure whether we can implement something similar
> with the bundle mechanism.
>
> On Mon, May 2, 2016 at 11:50 PM, Raghu Angadi
> <[email protected]> wrote:
> > What are good examples of streaming sinks that support checkpointing (or
> > transactions/rollbacks)? I don't Kafka supports a rollback.
> >
> > On Mon, May 2, 2016 at 2:54 AM, Maximilian Michels <[email protected]>
> wrote:
> >
> >> Yes, I would expect sinks to provide similar additional interfaces
> >> like sources, e.g. checkpointing. We could also use the
> >> startBundle/processElement/finishBundle lifecycle methods to implement
> >> checkpointing. I just wonder, if we want to make it more explicit.
> >> Also, does it make sense that sinks can return a PCollection? You can
> >> return PDone but you don't have to.
> >>
> >> Since sinks are fundamental in streaming pipelines, it just seemed odd
> >> to me that there is not dedicated interface. I understand a bit
> >> clearer now that it is not viewed as crucial because we can use
> >> existing primitives to create sinks. In a way, that might be elegant
> >> but also less explicit.
> >>
> >> On Fri, Apr 29, 2016 at 11:00 PM, Frances Perry <[email protected]
> >
> >> wrote:
> >> >>
> >> >> @Frances Sources are not simple DoFns. They add additional
> >> >> functionality, e.g. checkpointing, watermark generation, creating
> >> >> splits. If we want sinks to be portable, we should think about a
> >> >> dedicated interface. At least for the checkpointing.
> >> >>
> >> >
> >> > We might be mixing sources and sinks in this conversation. ;-) Sources
> >> > definitely provide additional functionality as you mentioned. But at
> >> least
> >> > currently, sinks don't provide any new primitive functionality. Are
> you
> >> > suggestion there needs to be a checkpointing interface for sinks
> beyond
> >> > DoFn's bundle finalization? (Note that the existing Write for batch is
> >> just
> >> > a PTransform based around ParDo.)
> >>
>

Reply via email to