[
https://issues.apache.org/jira/browse/SAMZA-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297872#comment-15297872
]
Branislav Cogic commented on SAMZA-856:
---------------------------------------
The patch is attached.
Here is a RB link: https://reviews.apache.org/r/47761
> Optionally automatically commit based on number of processed messages
> ---------------------------------------------------------------------
>
> Key: SAMZA-856
> URL: https://issues.apache.org/jira/browse/SAMZA-856
> Project: Samza
> Issue Type: Improvement
> Components: container
> Reporter: Elias Levy
> Assignee: Branislav Cogic
> Attachments: SAMZA-856.0.patch
>
>
> Currently Samza support automatic checkpoint commits based on time via the
> task.commit.ms property. The number of messages processed during any time
> window will vary with the throughput of the system. Thus, the current
> automatic checkpointing can't guarantee a maximum number of messages being
> reprocessed when recovering after a failure.
> I propose the addition of an option that would automatically commit
> checkpoints after a configurable number of messages have been processed.
> The messages could be counted per container, per task, or per stream.
> Properties could be named task.commit.msg.container.cnt,
> task.commit.msg.task.cnt and/or task.commit.msg.stream.cnt.
> Alternatively, a per stream count limit could use different values for
> different streams. E.g. task.commit.msg.stream.<some_stream>.cnt=1000,
> task.commit.msg.stream.<some_other_stream>.cnt=200.
> A message count auto commit would be orthogonal to the existing time based
> auto commit and they could be used at the same time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)