agreed. finishBundle() helps but can not guarantee consistent state. On Tue, May 3, 2016 at 1:49 AM, Maximilian Michels <[email protected]> wrote:
> Correct, Kafka doesn't support rollbacks of the producer. In Flink > there is the RollingSink which supports transactional rolling files. > Admittedly, that is the only one. Still, checkpointing sinks in Beam > could be useful for users who are concerned about exactly once > semantics. I'm not sure whether we can implement something similar > with the bundle mechanism. > > On Mon, May 2, 2016 at 11:50 PM, Raghu Angadi > <[email protected]> wrote: > > What are good examples of streaming sinks that support checkpointing (or > > transactions/rollbacks)? I don't Kafka supports a rollback. > > > > On Mon, May 2, 2016 at 2:54 AM, Maximilian Michels <[email protected]> > wrote: > > > >> Yes, I would expect sinks to provide similar additional interfaces > >> like sources, e.g. checkpointing. We could also use the > >> startBundle/processElement/finishBundle lifecycle methods to implement > >> checkpointing. I just wonder, if we want to make it more explicit. > >> Also, does it make sense that sinks can return a PCollection? You can > >> return PDone but you don't have to. > >> > >> Since sinks are fundamental in streaming pipelines, it just seemed odd > >> to me that there is not dedicated interface. I understand a bit > >> clearer now that it is not viewed as crucial because we can use > >> existing primitives to create sinks. In a way, that might be elegant > >> but also less explicit. > >> > >> On Fri, Apr 29, 2016 at 11:00 PM, Frances Perry <[email protected] > > > >> wrote: > >> >> > >> >> @Frances Sources are not simple DoFns. They add additional > >> >> functionality, e.g. checkpointing, watermark generation, creating > >> >> splits. If we want sinks to be portable, we should think about a > >> >> dedicated interface. At least for the checkpointing. > >> >> > >> > > >> > We might be mixing sources and sinks in this conversation. ;-) Sources > >> > definitely provide additional functionality as you mentioned. But at > >> least > >> > currently, sinks don't provide any new primitive functionality. Are > you > >> > suggestion there needs to be a checkpointing interface for sinks > beyond > >> > DoFn's bundle finalization? (Note that the existing Write for batch is > >> just > >> > a PTransform based around ParDo.) > >> >
