Elias Levy created SAMZA-856:
--------------------------------

             Summary: Optionally automatically commit based on number of 
processed messages
                 Key: SAMZA-856
                 URL: https://issues.apache.org/jira/browse/SAMZA-856
             Project: Samza
          Issue Type: Improvement
          Components: container
            Reporter: Elias Levy


Currently Samza support automatic checkpoint commits based on time via the 
task.commit.ms property.  The number of messages processed during any time 
window will vary with the throughput of the system.  Thus, the current 
automatic checkpointing can't guarantee a maximum number of messages being 
reprocessed when recovering after a failure.

I propose the addition of an option that would automatically commit checkpoints 
 after a configurable number of messages have been processed.  The messages 
could be counted per container, per task, or per stream. Properties could be 
named task.commit.msg.container.cnt, task.commit.msg.task.cnt and/or 
task.commit.msg.stream.cnt.

Alternatively, a per stream count limit could use different values for 
different streams. E.g. task.commit.msg.stream.<some_stream>.cnt=1000, 
task.commit.msg.stream.<some_other_stream>.cnt=200.

A message count auto commit would be orthogonal to the existing time based auto 
commit and they could be used at the same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to