It may or may not be due to colocating Kafka and Samza but you are probably tripping the failure detection in Kafka which considers a replica out of sync if it falls more than N messages behind. Can you try tuning this setting as described here: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowtoreducechurnsinISR?WhendoesabrokerleavetheISR ?
-Jay On Mon, Feb 9, 2015 at 4:35 PM, Karthik Sriram <amaron...@gmail.com> wrote: > Hey all, > I'm trying to run samza on a 5 node (YARN/Kafka/ZK) cluster with each box > running all 3 processes on AWS. I have been facing very weird performance > issues with Kafka when run this way. Kafka seems to get unbalanced very > often with replicas going out of sync every so often. This results in lost > messages when producing to this cluster. I initially suspected it was a > scale issue (70k-80k qps of incoming messages, ~120k qps peak) and reduced > write throughput by sampling just 10% of the messages but I still noticed > the same issues. The weird part is that this doesn't happen every time I > run, but many of the times. > > We have been using a much larger Kafka cluster for long with great > performance and have never seen such issues before. Then I saw ( > https://engineering.linkedin.com/samza/operating-apache-samza-scale) which > mentions that LinkedIn also faced some issues when collocating Samza and > Kafka. > > Can someone throw some light on this? Is collocating samza and kafka a > strict no, or is it more likely a Kafka/machine tuning issue ? Any help is > appreciated! > > Kafka version : 0.8.1.1 > Samza version: 0.8 > > Thanks a lot for your time, > Karthik >