[
https://issues.apache.org/jira/browse/SAMZA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125751#comment-14125751
]
Roger Hoover commented on SAMZA-405:
------------------------------------
[~criccomini],
Good point. If the outputs and also written as part of a transaction, then the
ability to replay is not necessary for recovery. I was thinking about it in
the context of rewind where you want to re-process streams with new logic in
place as sketched out here
(http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html).
In order to A/B test your new job logic with your old job logic, you may want
them both to process messages in the same order.
> Trying for deterministic behavior on recovery and rewind
> --------------------------------------------------------
>
> Key: SAMZA-405
> URL: https://issues.apache.org/jira/browse/SAMZA-405
> Project: Samza
> Issue Type: Improvement
> Reporter: Roger Hoover
>
> Ideally, we want streaming tasks to produce the exact same output on recovery
> or rewind as they did/would during normal operation. After thinking harder
> on this, I don't believe it's possible with at-least-once semantics. I think
> duplicates break ordering guarantees. For any message that updates local
> state, it can always be surrounded on both sides by duplicate of another
> message which negates it. Nonetheless, we can get closer now and if
> idempotent producers later are supported by Kafka, we'll have what we want.
> See discussion here:
> http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201409.mbox/%3CCAPOm=tpsevpxludaxintxj9z54gyeanocv5dk7nbotsgpd-...@mail.gmail.com%3E
> Here are three changes that seem to make sense for Samza to support in order
> to achieve this.
> 1) Bootstrapping is only appropriate on cold start, not when restoring saved
> state. On recovery, local state will be restored from the change log.
> 2) Local state should be saved and restored atomically with checkpoint state.
> This may require support for transactions in Kafka.
> 3) Ability to store and replay message chooser history. Samza could have a
> configuration option to save a history of the messages a task has processed.
> This log could be used during recovery or rewind to replay messages in a
> deterministic order.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)