Hi Chris Thanks, this answers my question perfectly. I know it is a rather theoretical question, but I'm currently evaluating different node failure scenarios and was curious about certain precision drops in the results during some extreme cases. I think this is exactly what happened, the state of one particular stream task was "lost".
Best Nicolas On Mon, Jul 21, 2014 at 9:32 PM, Chris Riccomini < [email protected]> wrote: > Hey Nicolas, > > There are two topics in question here: > > 1. Checkpoint topic. > 2. State topic. > > (1) is used to store the input offsets for every SystemStreamPartition > that's consumed by your StreamTask. > (2) is used to store the writes/deletes by your StreamTask to a state > store. > > The first thing to establish is what you mean by "lost". In Kafka 0.8+, > topics are assumed to be highly available. If a cluster knows about a > topic (i.e. It has an entry in ZooKeeper), but the brokers that contain > the data for the topic are not available, consumers/producers will get > either a timeout (if they were talking to a dead broker) or a "no leader > for partition" exception. > > Now, if the topic/partition exists, and a leader is available for the > partition, but the leader has no data, then we might call it "lost". In > this case, a consumer/producer just sees the topic/partition as empty. > With this definition, we can "lose" either the checkpoint topic or the > state store topic. If the checkpoint topic is lost, the Samza job will > start using samza.offset.default. Likewise, if the state store topic is > "lost", Samza would just treat the store as empty. > > Cheers, > Chris > > On 7/21/14 11:25 AM, "Nicolas Bär" <[email protected]> wrote: > > >Hi All > > > >I'm looking into the case of Kafka broker failures with regard to offset > >management and state checkpointing. As far as I understood the offsets and > >state checkpoints are stored in separate Kafka topics and happen at the > >same time. Therefore if a samza job fails it will restore the state from > >the corresponding Kafka topics. > >Now what happens if for example the offset topic is available, but the > >data > >of the checkpoint topic is lost due to Kafka broker failures? I'm guessing > >Kafka restores the offset and continues as if there was never any state > >stored. Is this correct? > > > > > >Best > >Nicolas > >
