[ 
https://issues.apache.org/jira/browse/SAMZA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172529#comment-14172529
 ] 

Roger Hoover commented on SAMZA-405:
------------------------------------

bq. 1) Bootstrapping is only appropriate on cold start, not when restoring 
saved state. On recovery, local state will be restored from the change log.

I've come to realize that there are cases where bootstrapping on recovery still 
makes sense.  For a simple stream/table join, one of the input streams is used 
to populate the KV store.  In this case, a separate changelog for the KV store 
is not required because it would be an exactly copy of the input stream.  On 
recovery, the KV store can be restored from boostrapping the input stream 
again.  Actually, it's not a complete bootstrap but instead should re-process 
the stream up to the last saved checkpoint.

> Trying for deterministic behavior on recovery and rewind
> --------------------------------------------------------
>
>                 Key: SAMZA-405
>                 URL: https://issues.apache.org/jira/browse/SAMZA-405
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Roger Hoover
>
> Ideally, we want streaming tasks to produce the exact same output on recovery 
> or rewind as they did/would during normal operation.  After thinking harder 
> on this, I don't believe it's possible with at-least-once semantics.  I think 
> duplicates break ordering guarantees.  For any message that updates local 
> state, it can always be surrounded on both sides by duplicate of another 
> message which negates it.  Nonetheless, we can get closer now and if 
> idempotent producers later are supported by Kafka, we'll have what we want.
> See discussion here: 
> http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201409.mbox/%3CCAPOm=tpsevpxludaxintxj9z54gyeanocv5dk7nbotsgpd-...@mail.gmail.com%3E
> Here are three changes that seem to make sense for Samza to support in order 
> to achieve this.
> 1) Bootstrapping is only appropriate on cold start, not when restoring saved 
> state.  On recovery, local state will be restored from the change log.
> 2) Local state should be saved and restored atomically with checkpoint state. 
>  This may require support for transactions in Kafka.
> 3) Ability to store and replay message chooser history.  Samza could have a 
> configuration option to save a history of the messages a task has processed.  
> This log could be used during recovery or rewind to replay messages in a 
> deterministic order. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to