Elias Levy created SAMZA-856:
--------------------------------
Summary: Optionally automatically commit based on number of
processed messages
Key: SAMZA-856
URL: https://issues.apache.org/jira/browse/SAMZA-856
Project: Samza
Issue Type: Improvement
Components: container
Reporter: Elias Levy
Currently Samza support automatic checkpoint commits based on time via the
task.commit.ms property. The number of messages processed during any time
window will vary with the throughput of the system. Thus, the current
automatic checkpointing can't guarantee a maximum number of messages being
reprocessed when recovering after a failure.
I propose the addition of an option that would automatically commit checkpoints
after a configurable number of messages have been processed. The messages
could be counted per container, per task, or per stream. Properties could be
named task.commit.msg.container.cnt, task.commit.msg.task.cnt and/or
task.commit.msg.stream.cnt.
Alternatively, a per stream count limit could use different values for
different streams. E.g. task.commit.msg.stream.<some_stream>.cnt=1000,
task.commit.msg.stream.<some_other_stream>.cnt=200.
A message count auto commit would be orthogonal to the existing time based auto
commit and they could be used at the same time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)