[
https://issues.apache.org/jira/browse/SAMZA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123784#comment-14123784
]
Chris Riccomini commented on SAMZA-405:
---------------------------------------
bq. Ability to store and replay message chooser history. Samza could have a
configuration option to save a history of the messages a task has processed.
This log could be used during recovery or rewind to replay messages in a
deterministic order.
If we have Kafka transactionality, and atomically commit all changelog
messages, output messages, and checkpoint messages as a single transaction, I'm
not entirely sure that we need this. In such a case, we will only ever recover
on transaction boundaries, so we shouldn't care about what the MessageChooser
picked before the failure, because nothing was output by the prior container
(due to a failed transaction). The one area where this could be useful is if
you're writing to something other than Kafka, and therefore don't have
transactions. In such a case, you could mimic the idempotent producer if your
inputs were deterministic and your processing were also deterministic. Is this
along the lines of what you were thinking?
> Trying for deterministic behavior on recovery and rewind
> --------------------------------------------------------
>
> Key: SAMZA-405
> URL: https://issues.apache.org/jira/browse/SAMZA-405
> Project: Samza
> Issue Type: Improvement
> Reporter: Roger Hoover
>
> Ideally, we want streaming tasks to produce the exact same output on recovery
> or rewind as they did/would during normal operation. After thinking harder
> on this, I don't believe it's possible with at-least-once semantics. I think
> duplicates break ordering guarantees. For any message that updates local
> state, it can always be surrounded on both sides by duplicate of another
> message which negates it. Nonetheless, we can get closer now and if
> idempotent producers later are supported by Kafka, we'll have what we want.
> See discussion here:
> http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201409.mbox/%3CCAPOm=tpsevpxludaxintxj9z54gyeanocv5dk7nbotsgpd-...@mail.gmail.com%3E
> Here are three changes that seem to make sense for Samza to support in order
> to achieve this.
> 1) Bootstrapping is only appropriate on cold start, not when restoring saved
> state. On recovery, local state will be restored from the change log.
> 2) Local state should be saved and restored atomically with checkpoint state.
> This may require support for transactions in Kafka.
> 3) Ability to store and replay message chooser history. Samza could have a
> configuration option to save a history of the messages a task has processed.
> This log could be used during recovery or rewind to replay messages in a
> deterministic order.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)