Tried bisecting, but turns out things were broken for some time. We really
need some system tests in place to avoid letting even new code break for so
long.

At 49026f11781181c38e9d5edb634be9d27245c961 (May 14th), we went from good
performance -> an error due to broker apparently not accepting the
partition assignment strategy. Since this commit seems to add heartbeats
and the server side code for partition assignment strategies, I assume we
were missing something on the client side and by filling in the server
side, things stopped working.

On either 84636272422b6379d57d4c5ef68b156edc1c67f8 or
a5b11886df8c7aad0548efd2c7c3dbc579232f03 (July 17th), I am able to run the
perf test again, but it's slow -- ~10MB/s for me vs the 2MB/s Jay was
seeing, but that's still far less than the 600MB/s I saw on the earlier
commits.

Added this to the new consumer checklist, marked for 0.8.3, and at least
for now assigned to Jason since I think he'll probably be able to sort this
out most quickly: https://issues.apache.org/jira/browse/KAFKA-2486

-Ewen


On Thu, Aug 27, 2015 at 8:03 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> 436b7ddc386eb688ba0f12836710f5e4bcaa06c8 is pretty recent and there could
> be some current consumer improvement patches that introduces some
> regression. I would suggest doing a binary search in the log from
> 3f8480ccfb011eb43da774737597c597f703e11b
> (maybe even earlier?) to do a quick check.
>
> Guozhang
>
> On Thu, Aug 27, 2015 at 4:39 PM, Jay Kreps <j...@confluent.io> wrote:
>
> > I think this is likely a regression. The two clients had more or less
> > equivalent performance when we checked in the code (see my post on this
> > earlier in the year). Looks like maybe we broke something up in the
> > interim?
> >
> > On my laptop the new consumer perf seems to have dropped from about
> > ~200MB/sec to about 2MB/sec.
> >
> > -Jay
> >
> >
> > On Thu, Aug 27, 2015 at 4:21 PM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > wrote:
> >
> > > I don't think the commands are really equivalent despite just adding
> the
> > > --new-consumer flag. ConsumerPerformance uses a single thread when
> using
> > > the new consumer (it literally just allocates the consumer, loops until
> > > it's consumed enough, then exits), whereas the old consumer uses a
> bunch
> > of
> > > additional threads.
> > >
> > > To really compare performance, someone would have to think through a
> fair
> > > way to compare them -- the two operate so differently that you'd have
> to
> > be
> > > very careful to get an apples-to-apples comparison.
> > >
> > > By the way, membership in consumer groups should be a lot cheaper with
> > the
> > > new consumer (the ZK coordination issues with lots of consumers aren't
> a
> > > problem since ZK is not involved), so you can probably scale up the
> > number
> > > of consumer threads with little impact. It might be nice to patch the
> > > consumer perf test to respect the # of threads setting, which might be
> a
> > > first step to getting a more reasonable comparison.
> > >
> > > -Ewen
> > >
> > > On Thu, Aug 27, 2015 at 11:25 AM, Poorna Chandra Tejashvi Reddy <
> > > pctre...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > We have built the latest kafka from https://github.com/apache/kafka
> > > based
> > > > on this commit id 436b7ddc386eb688ba0f12836710f5e4bcaa06c8 .
> > > > We ran the performance test on a 3 node kafka cluster. There is a
> huge
> > > > throughput degradation using the new-consumer compared to the regular
> > > > consumer. Below are the numbers that explain the same.
> > > >
> > > > bin/kafka-consumer-perf-test.sh --zookeeper zkIp:2181 --broker-list
> > > > brokerIp:9092 --topics test --messages 5000000 : gives a throughput
> of
> > > 693
> > > > K
> > > >
> > > > bin/kafka-consumer-perf-test.sh --zookeeper zkIp:2181 --broker-list
> > > > brokerIp:9092 --topics test --messages 5000000 --new-consumer :
> gives a
> > > > throughput of  51k
> > > >
> > > > The whole set up is based on ec2, Kafka brokers running on r3.2x
> large.
> > > >
> > > > Are you guys aware of this performance degradation , do you have a
> JIRA
> > > for
> > > > this, which can be used to track the resolution.
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > -Poorna
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Ewen
> > >
> >
>
>
>
> --
> -- Guozhang
>



-- 
Thanks,
Ewen

Reply via email to