Hi Samza Devs

StreamTask can control SamzaContainer.commit() through task coordinator.
Can we make SystemProducer control commit after flush? With this feature,
we can prevent any duplicate data on SamzaContainer failure.

For example, if we set commit interval as 2 minutes, before commit time
interval expires, when its buffer size is greater than batch size,
SystemProducer will flush data in the buffer. Right after flush, when the
container dies, another container will start from the previous commit.
Then, we will have duplicate data.

If we have longer commit interval, we will have more duplicate data. I know
this is not a big deal because container failure will be rare case and just
a few minutes data will be duplicated. But I will be happy if we can clear
this little concern.

Any idea?

Thank you
Best, Jae

Reply via email to