Hi, Qi, This would depend on the following two factors: # whether the send() is async or sync # how do you handle the send failure
If the send() is sync, you will always receive an exception in your process() method when MessageCollector.send() is invoked. Hence, if your code does not handle the exception, it would be thrown out to the RunLoop and the whole container will fail. If your code captures the exception, it is then up to your application logic to deal with the send failure (i.e. user will need to choose either ignore the send failure and proceed, or fail and stop). If you choose to not ignore the send failures, then in this case, the checkpoint will not proceed beyond the input that caused the send failures, and the container will restart with the previous checkpoint, which does not cause data loss. If the send() is async, the commit procedure in RunLoop will make sure to flush all pending sends before checkpointing. If the flush fails, the exception will be thrown out and the container will fail. Hence, when restarted, the container will repeat from the previous checkpoint (i.e. at least once delivery still holds and no data loss). Hope the above answers your question. Thanks! -Yi On Thu, May 11, 2017 at 12:43 AM, 舒琦 <[email protected]> wrote: > Hi Jagadish, > > I may not express my questions clearly. > > Here is what I want to know. When MessageCollector.send is called > in process method, if sending fail and fail again, under this situation is > it possible to cause data loss ( continue to fetch and process messages, > but can’t send them out, at the same time offset is still forwarding and > checkpointing ). > > Thanks very much. > > ———————— > Qi Shu > > > 在 2017年5月11日,15:35,Jagadish Venkatraman <[email protected]> 写道: > > > > Hi Qi, > > > >>> If one record can’t be sent out all the time, then the consumer > > will still fetch messages or not, and what about the offset > checkpointing? > > > > Polling / fetching messages from the consumer (in case of Kafka) happens > in > > a separate thread. > > > > Samza offers an at-least once processing guarantee with zero data loss. > > > > I'm not sure I understand your specific question about checkpointing? > > > > > > On Thu, May 11, 2017 at 12:28 AM, 舒琦 <[email protected]> wrote: > > > >> Hi, > >> > >> Below is the description about checkpointing. > >> > >> 『Checkpointing is guaranteed to only cover events that are fully > >> processed. It happens only when there are no pending > >> process()/processAsync() or WindowableTask.window() invocations. All > >> preceding invocations happen-before checkpointing and checkpointing > >> happens-before all subsequent invocations.』 > >> > >> If one record can’t be sent out all the time, then the consumer > >> will still fetch messages or not, and what about the offset > checkpointing? > >> > >> Thanks! > >> > >> ———————— > >> Qi Shu > > > > > > > > > > -- > > Jagadish V, > > Graduate Student, > > Department of Computer Science, > > Stanford University > >
