Xinyu,

Thanks for the answers. Those suggestions are helpful as well.

David

On Thu, Oct 6, 2016 at 12:48 PM xinyu liu <xinyuliu...@gmail.com> wrote:

> Hi, David,
>
> For your questions:
>
> 1) In this case Samza recovered but the changelog message was lost. In
> 0.10.1 KafkaSystemProducer has a race condition: there is small chance the
> later send success might override the previous failure. The bug is fixed in
> the upcoming 0.11.0 release (SAMZA-1019). The fix allows you to catch the
> exception and then you can decide to ignore or rethrow it. In the latter
> case the container will fail and Samza will guarantee the message will be
> reprocessed after it restarts.
>
> 2) There are several ways that might help in your case: First you can turn
> on compression for your checkpoint stream. That usually saves about 20% -
> 30%. Second, you can also bump up the max.requst.size for the producer. In
> this case you need to make sure the broker also set up the corresponding
> max message size. Last, you might also try to split the key into subkeys so
> the value will be smaller.
>
> Thanks,
> Xinyu
>
> On Thu, Oct 6, 2016 at 9:30 AM, David Yu <david...@optimizely.com> wrote:
>
> > Hi,
> >
> > Our Samza job (0.10.1) throws RecordTooLargeExceptions when flushing the
> KV
> > store change to the changelog topic, as well as sending outputs to Kafka.
> > We have two questions to this problem:
> >
> > 1. It seems that after the affected containers failed multiple times, the
> > job was able to recover and move on. This is a bit hard to understand.
> How
> > could this be recoverable? We were glad it actually did, but are
> > uncomfortable not knowing the reason behind it.
> > 2. We would be the best way to prevent this from happening? Since Samza
> > serde happens behind the scenes, there does not seem to be a good way to
> > find out the payload size in bytes before putting into the KV store. Any
> > suggestions on this?
> >
> > Thanks,
> > David
> >
>

Reply via email to