Re: Review Request 45258: Abandon producer retry after a certain # of errors : SAMZA-911

Jagadish Venkatraman Thu, 14 Apr 2016 16:39:33 -0700


> On April 14, 2016, 8:43 p.m., Chris Pettitt wrote:
> > samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemProducer.scala,
> >  line 71
> > <https://reviews.apache.org/r/45258/diff/1/?file=1312769#file1312769line71>
> >
> >     This is not directly related to your commit, but:
> >     
> >     This function is not reentrant. Is that expected? For example, it would 
> > be possible for a the Kafka callback to set sendFailed to true but a 
> > subsequent send would reset this flag.
> >     
> >     
> >     ---
> >     
> >     Separately, there is an implicit assumption that retryBackoff is 
> > providing a happens-before constraint between an invocation of the 
> > exception handler in retryBackoff and a subsequent invocation of the loop. 
> > This likely always holds, but a CAS for numRetries would cover cases where 
> > that does not hold.
> 
> Jagadish Venkatraman wrote:
>     Yes, that's true. 
>     
>     1.It's entirely possible for a Kafka callback to set sendFailed to true, 
> and a subsequent send to reset this flag. Ideally, errors in the callbacks 
> should shut-down the producer in the same callback thread.
>     2. It's also possible for a subsequent send (for msg2) to be processed 
> even if the send for the previous message (msg1) fails.
>     
>     This is the race in the producer that Navina mentioned in her review of 
> this RB. (and requested that a separate jira be created to track). SAMZA-934 
> tracks this.
> 
> Chris Pettitt wrote:
>     Yes, my list was not exhaustive :). Fine to track the first issue via 
> JIRA. The second issue is introduced in this RB, tho.


I think the only thing that may fail the happens before constraint is: Part of 
the exception handler is updated in one thread, and the rest is updated in a 
separate thread. (we don't have this case now). I'll change it to an 
AtomicInteger instead of an integer.


- Jagadish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45258/#review128972
-----------------------------------------------------------


On April 14, 2016, 10:05 p.m., Jagadish Venkatraman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45258/
> -----------------------------------------------------------
> 
> (Updated April 14, 2016, 10:05 p.m.)
> 
> 
> Review request for samza, Jake Maes, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Bugs: SAMZA-911
>     https://issues.apache.org/jira/browse/SAMZA-911
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Currently, the KafkaSystemProducer's producer loop keeps retrying 
> indefinitely when there is an exception in the retryBackOff loop. If there 
> are repeated exceptions, then it makes sense to retry for awhile, and then 
> fail the container.
> 
> Long term, we should focus on getting rid off the retryBackOff loop, and 
> close the producer object in the callback during failure. This will guarantee 
> in-order delivery.
> 
> 1.Modified the KafkaSystemProducer to take a maxRetries. (currently, its set 
> to 30).
> 2.Add tests to verify retry in case of RetriableExceptions.
> 
> 
> Diffs
> -----
> 
>   
> samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemProducer.scala
>  9a44d46d29a1997958a9d2bbf7be0bde860fff64 
>   
> samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemProducer.scala
>  39426d8cf64516ec4fdc0cb4ff60b1df3a757470 
> 
> Diff: https://reviews.apache.org/r/45258/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests to verify functionality.
> 
> 
> Thanks,
> 
> Jagadish Venkatraman
> 
>

Re: Review Request 45258: Abandon producer retry after a certain # of errors : SAMZA-911

Reply via email to