Never mind. I found a solution. Flush should be synced with commit. On Thu, Jan 29, 2015 at 12:15 AM, Bae, Jae Hyeon <metac...@gmail.com> wrote:
> Hi Samza Devs > > StreamTask can control SamzaContainer.commit() through task coordinator. > Can we make SystemProducer control commit after flush? With this feature, > we can prevent any duplicate data on SamzaContainer failure. > > For example, if we set commit interval as 2 minutes, before commit time > interval expires, when its buffer size is greater than batch size, > SystemProducer will flush data in the buffer. Right after flush, when the > container dies, another container will start from the previous commit. > Then, we will have duplicate data. > > If we have longer commit interval, we will have more duplicate data. I > know this is not a big deal because container failure will be rare case and > just a few minutes data will be duplicated. But I will be happy if we can > clear this little concern. > > Any idea? > > Thank you > Best, Jae >