Hi, David,

For your questions:

1) In this case Samza recovered but the changelog message was lost. In
0.10.1 KafkaSystemProducer has a race condition: there is small chance the
later send success might override the previous failure. The bug is fixed in
the upcoming 0.11.0 release (SAMZA-1019). The fix allows you to catch the
exception and then you can decide to ignore or rethrow it. In the latter
case the container will fail and Samza will guarantee the message will be
reprocessed after it restarts.

2) There are several ways that might help in your case: First you can turn
on compression for your checkpoint stream. That usually saves about 20% -
30%. Second, you can also bump up the max.requst.size for the producer. In
this case you need to make sure the broker also set up the corresponding
max message size. Last, you might also try to split the key into subkeys so
the value will be smaller.

Thanks,
Xinyu

On Thu, Oct 6, 2016 at 9:30 AM, David Yu <david...@optimizely.com> wrote:

> Hi,
>
> Our Samza job (0.10.1) throws RecordTooLargeExceptions when flushing the KV
> store change to the changelog topic, as well as sending outputs to Kafka.
> We have two questions to this problem:
>
> 1. It seems that after the affected containers failed multiple times, the
> job was able to recover and move on. This is a bit hard to understand. How
> could this be recoverable? We were glad it actually did, but are
> uncomfortable not knowing the reason behind it.
> 2. We would be the best way to prevent this from happening? Since Samza
> serde happens behind the scenes, there does not seem to be a good way to
> find out the payload size in bytes before putting into the KV store. Any
> suggestions on this?
>
> Thanks,
> David
>

Reply via email to