Never mind. I found a solution. Flush should be synced with commit.

On Thu, Jan 29, 2015 at 12:15 AM, Bae, Jae Hyeon <metac...@gmail.com> wrote:

> Hi Samza Devs
>
> StreamTask can control SamzaContainer.commit() through task coordinator.
> Can we make SystemProducer control commit after flush? With this feature,
> we can prevent any duplicate data on SamzaContainer failure.
>
> For example, if we set commit interval as 2 minutes, before commit time
> interval expires, when its buffer size is greater than batch size,
> SystemProducer will flush data in the buffer. Right after flush, when the
> container dies, another container will start from the previous commit.
> Then, we will have duplicate data.
>
> If we have longer commit interval, we will have more duplicate data. I
> know this is not a big deal because container failure will be rare case and
> just a few minutes data will be duplicated. But I will be happy if we can
> clear this little concern.
>
> Any idea?
>
> Thank you
> Best, Jae
>

Reply via email to