Xinyu, Thanks for the answers. Those suggestions are helpful as well.
David On Thu, Oct 6, 2016 at 12:48 PM xinyu liu <xinyuliu...@gmail.com> wrote: > Hi, David, > > For your questions: > > 1) In this case Samza recovered but the changelog message was lost. In > 0.10.1 KafkaSystemProducer has a race condition: there is small chance the > later send success might override the previous failure. The bug is fixed in > the upcoming 0.11.0 release (SAMZA-1019). The fix allows you to catch the > exception and then you can decide to ignore or rethrow it. In the latter > case the container will fail and Samza will guarantee the message will be > reprocessed after it restarts. > > 2) There are several ways that might help in your case: First you can turn > on compression for your checkpoint stream. That usually saves about 20% - > 30%. Second, you can also bump up the max.requst.size for the producer. In > this case you need to make sure the broker also set up the corresponding > max message size. Last, you might also try to split the key into subkeys so > the value will be smaller. > > Thanks, > Xinyu > > On Thu, Oct 6, 2016 at 9:30 AM, David Yu <david...@optimizely.com> wrote: > > > Hi, > > > > Our Samza job (0.10.1) throws RecordTooLargeExceptions when flushing the > KV > > store change to the changelog topic, as well as sending outputs to Kafka. > > We have two questions to this problem: > > > > 1. It seems that after the affected containers failed multiple times, the > > job was able to recover and move on. This is a bit hard to understand. > How > > could this be recoverable? We were glad it actually did, but are > > uncomfortable not knowing the reason behind it. > > 2. We would be the best way to prevent this from happening? Since Samza > > serde happens behind the scenes, there does not seem to be a good way to > > find out the payload size in bytes before putting into the KV store. Any > > suggestions on this? > > > > Thanks, > > David > > >