Is there a substantial variance in performance caused by high cpu load and cache churn? I've seen this sort of inadequate perf isolation wreak havoc on high QPS systems.
On Mon Feb 09 2015 at 4:55:28 PM Jay Kreps <jay.kr...@gmail.com> wrote: > It may or may not be due to colocating Kafka and Samza but you are probably > tripping the failure detection in Kafka which considers a replica out of > sync if it falls more than N messages behind. Can you try tuning this > setting as described here: > https://cwiki.apache.org/confluence/display/KAFKA/FAQ# > FAQ-HowtoreducechurnsinISR?WhendoesabrokerleavetheISR > ? > > -Jay > > On Mon, Feb 9, 2015 at 4:35 PM, Karthik Sriram <amaron...@gmail.com> > wrote: > > > Hey all, > > I'm trying to run samza on a 5 node (YARN/Kafka/ZK) cluster with each > box > > running all 3 processes on AWS. I have been facing very weird performance > > issues with Kafka when run this way. Kafka seems to get unbalanced very > > often with replicas going out of sync every so often. This results in > lost > > messages when producing to this cluster. I initially suspected it was a > > scale issue (70k-80k qps of incoming messages, ~120k qps peak) and > reduced > > write throughput by sampling just 10% of the messages but I still noticed > > the same issues. The weird part is that this doesn't happen every time I > > run, but many of the times. > > > > We have been using a much larger Kafka cluster for long with great > > performance and have never seen such issues before. Then I saw ( > > https://engineering.linkedin.com/samza/operating-apache-samza-scale) > which > > mentions that LinkedIn also faced some issues when collocating Samza and > > Kafka. > > > > Can someone throw some light on this? Is collocating samza and kafka a > > strict no, or is it more likely a Kafka/machine tuning issue ? Any help > is > > appreciated! > > > > Kafka version : 0.8.1.1 > > Samza version: 0.8 > > > > Thanks a lot for your time, > > Karthik > > >