Roger Hoover created SAMZA-405:
----------------------------------

             Summary: Trying for deterministic behavior on recovery and rewind
                 Key: SAMZA-405
                 URL: https://issues.apache.org/jira/browse/SAMZA-405
             Project: Samza
          Issue Type: Improvement
            Reporter: Roger Hoover


Ideally, we want streaming tasks to produce the exact same output on recovery 
or rewind as they did/would during normal operation.  After thinking harder on 
this, I don't believe it's possible with at-least-once semantics.  I think 
duplicates break ordering guarantees.  For any message that updates local 
state, it can always be surrounded on both sides by duplicate of another 
message which negates it.  Nonetheless, we can get closer now and if idempotent 
producers later are supported by Kafka, we'll have what we want.

See discussion here: 
http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201409.mbox/%3CCAPOm=tpsevpxludaxintxj9z54gyeanocv5dk7nbotsgpd-...@mail.gmail.com%3E

Here are three changes that seem to make sense for Samza to support in order to 
achieve this.

1) Bootstrapping is only appropriate on cold start, not when restoring saved 
state.  On recovery, local state will be restored from the change log.
2) Local state should be saved and restored atomically with checkpoint state.  
This may require support for transactions in Kafka.
3) Ability to store and replay message chooser history.  Samza could have a 
configuration option to save a history of the messages a task has processed.  
This log could be used during recovery or rewind to replay messages in a 
deterministic order. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to