Hi Chris


On Thu, Jan 29, 2015 at 9:10 AM, Chris Riccomini <criccom...@apache.org>
wrote:

> Hey Jae,
>
> If I understand you correctly, your concern is that there could be flushes
> in-between commits. For example:
>
>   T=30s; flush
>   T=45s; flush
>   T=60s; flush && commit
>   T=65s; flush
>
> Your concern here is that if there's a failure before 60s, the messages
> that were flushed at 30s and 45s will be duplicated when the container
> reprocesses, right?


Correct

>


> > Never mind. I found a solution. Flush should be synced with commit.
>

Last night, I was sleepy and struggling with finding a solution, so this
morning, it turned out to be wrong :(
My idea was, send() function does not call flush() even though the buffer
is full. But this is risky.

Actually, I was writing our internal data pipeline component as StreamTask
but I switched it to SystemProducer as Metamx Druid Tranquility did. But I
overlook that duplicate data which can be caused by flush & commit mismatch.

Do you have any idea?

>
> Could you elaborate on this?
>
> Cheers,
> Chris
>
> On Thu, Jan 29, 2015 at 12:27 AM, Bae, Jae Hyeon <metac...@gmail.com>
> wrote:
>
> > Never mind. I found a solution. Flush should be synced with commit.
> >
> > On Thu, Jan 29, 2015 at 12:15 AM, Bae, Jae Hyeon <metac...@gmail.com>
> > wrote:
> >
> > > Hi Samza Devs
> > >
> > > StreamTask can control SamzaContainer.commit() through task
> coordinator.
> > > Can we make SystemProducer control commit after flush? With this
> feature,
> > > we can prevent any duplicate data on SamzaContainer failure.
> > >
> > > For example, if we set commit interval as 2 minutes, before commit time
> > > interval expires, when its buffer size is greater than batch size,
> > > SystemProducer will flush data in the buffer. Right after flush, when
> the
> > > container dies, another container will start from the previous commit.
> > > Then, we will have duplicate data.
> > >
> > > If we have longer commit interval, we will have more duplicate data. I
> > > know this is not a big deal because container failure will be rare case
> > and
> > > just a few minutes data will be duplicated. But I will be happy if we
> can
> > > clear this little concern.
> > >
> > > Any idea?
> > >
> > > Thank you
> > > Best, Jae
> > >
> >
>

Reply via email to