It may or may not be due to colocating Kafka and Samza but you are probably
tripping the failure detection in Kafka which considers a replica out of
sync if it falls more than N messages behind. Can you try tuning this
setting as described here:
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowtoreducechurnsinISR?WhendoesabrokerleavetheISR
?

-Jay

On Mon, Feb 9, 2015 at 4:35 PM, Karthik Sriram <amaron...@gmail.com> wrote:

> Hey all,
>  I'm trying to run samza on a 5 node (YARN/Kafka/ZK) cluster with each box
> running all 3 processes on AWS. I have been facing very weird performance
> issues with Kafka when run this way. Kafka seems to get unbalanced very
> often with replicas going out of sync every so often. This results in lost
> messages when producing to this cluster. I initially suspected it was a
> scale issue (70k-80k qps of incoming messages, ~120k qps peak) and reduced
> write throughput by sampling just 10% of the messages but I still noticed
> the same issues. The weird part is that this doesn't happen every time I
> run, but many of the times.
>
> We have been using a much larger Kafka cluster for long with great
> performance and have never seen such issues before. Then I saw (
> https://engineering.linkedin.com/samza/operating-apache-samza-scale) which
> mentions that LinkedIn also faced some issues when collocating Samza and
> Kafka.
>
> Can someone throw some light on this? Is collocating samza and kafka a
> strict no, or is it more likely a Kafka/machine tuning issue ? Any help is
> appreciated!
>
> Kafka version : 0.8.1.1
> Samza version: 0.8
>
> Thanks a lot for your time,
> Karthik
>

Reply via email to