Hi Samza Devs StreamTask can control SamzaContainer.commit() through task coordinator. Can we make SystemProducer control commit after flush? With this feature, we can prevent any duplicate data on SamzaContainer failure.
For example, if we set commit interval as 2 minutes, before commit time interval expires, when its buffer size is greater than batch size, SystemProducer will flush data in the buffer. Right after flush, when the container dies, another container will start from the previous commit. Then, we will have duplicate data. If we have longer commit interval, we will have more duplicate data. I know this is not a big deal because container failure will be rare case and just a few minutes data will be duplicated. But I will be happy if we can clear this little concern. Any idea? Thank you Best, Jae