> On April 14, 2016, 8:43 p.m., Chris Pettitt wrote: > > samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemProducer.scala, > > line 71 > > <https://reviews.apache.org/r/45258/diff/1/?file=1312769#file1312769line71> > > > > This is not directly related to your commit, but: > > > > This function is not reentrant. Is that expected? For example, it would > > be possible for a the Kafka callback to set sendFailed to true but a > > subsequent send would reset this flag. > > > > > > --- > > > > Separately, there is an implicit assumption that retryBackoff is > > providing a happens-before constraint between an invocation of the > > exception handler in retryBackoff and a subsequent invocation of the loop. > > This likely always holds, but a CAS for numRetries would cover cases where > > that does not hold. > > Jagadish Venkatraman wrote: > Yes, that's true. > > 1.It's entirely possible for a Kafka callback to set sendFailed to true, > and a subsequent send to reset this flag. Ideally, errors in the callbacks > should shut-down the producer in the same callback thread. > 2. It's also possible for a subsequent send (for msg2) to be processed > even if the send for the previous message (msg1) fails. > > This is the race in the producer that Navina mentioned in her review of > this RB. (and requested that a separate jira be created to track). SAMZA-934 > tracks this. > > Chris Pettitt wrote: > Yes, my list was not exhaustive :). Fine to track the first issue via > JIRA. The second issue is introduced in this RB, tho.
I think the only thing that may fail the happens before constraint is: Part of the exception handler is updated in one thread, and the rest is updated in a separate thread. (we don't have this case now). I'll change it to an AtomicInteger instead of an integer. - Jagadish ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/45258/#review128972 ----------------------------------------------------------- On April 14, 2016, 10:05 p.m., Jagadish Venkatraman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/45258/ > ----------------------------------------------------------- > > (Updated April 14, 2016, 10:05 p.m.) > > > Review request for samza, Jake Maes, Navina Ramesh, and Yi Pan (Data > Infrastructure). > > > Bugs: SAMZA-911 > https://issues.apache.org/jira/browse/SAMZA-911 > > > Repository: samza > > > Description > ------- > > Currently, the KafkaSystemProducer's producer loop keeps retrying > indefinitely when there is an exception in the retryBackOff loop. If there > are repeated exceptions, then it makes sense to retry for awhile, and then > fail the container. > > Long term, we should focus on getting rid off the retryBackOff loop, and > close the producer object in the callback during failure. This will guarantee > in-order delivery. > > 1.Modified the KafkaSystemProducer to take a maxRetries. (currently, its set > to 30). > 2.Add tests to verify retry in case of RetriableExceptions. > > > Diffs > ----- > > > samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemProducer.scala > 9a44d46d29a1997958a9d2bbf7be0bde860fff64 > > samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemProducer.scala > 39426d8cf64516ec4fdc0cb4ff60b1df3a757470 > > Diff: https://reviews.apache.org/r/45258/diff/ > > > Testing > ------- > > Added unit tests to verify functionality. > > > Thanks, > > Jagadish Venkatraman > >