[jira] [Commented] (SAMZA-111) SystemConsumers is slow with large partition count

Chris Riccomini (JIRA) Thu, 19 Dec 2013 10:54:24 -0800

    [ 
https://issues.apache.org/jira/browse/SAMZA-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13853138#comment-13853138
 ]


Chris Riccomini commented on SAMZA-111:
---------------------------------------

I have been doing a bit of digging into this.

I've written a MockSystem that simulates a KafkaSystem, with a configurable 
number of BrokerProxy threads. When I attach a debugger, and take CPU samples, 
I can see some interesting problems:

1. SystemConsumers is not filtering out SystemStreamPartitions that it's 
fetching before calling poll on the underlying system (in this case, 
KafkaSystemConsumer, which extends BlockingEnvelopeMap). This means every 
poll() request is iterating over all SystemStreamPartitions, even though every 
single one has outstanding messages to process already, and doesn't need to be 
polled.
2. If you filter out the fetchMap appropriately, the next bottleneck is in 
SystemConsumers.poll, which is spending all of its time iterating over the 
fetchMap to do the filtering.
3. The systemFetchMap.size call in SystemConsumers.poll is taking 15% of total 
CPU on the main thread.

I did a quick hack to keep the filtered systemFetchMap cached as fetchMap is 
updated (rather than rebuilding every time poll is called), and temporarily 
disabled the systemFetchMap.size call. These two changes boosted simulated 
performance from 650 messages/sec to 70,000-100,000 messages/sec with 12 
threads, 1000 streams, and 4 partitions each (4000 partitions divided evenly 
between 12 simulated BrokerProxy threads).

I am convinced that we can make this go much faster, as well. My hack is not 
really that efficient.

I'll post a patch with the MockSystem shortly.

> SystemConsumers is slow with large partition count
> --------------------------------------------------
>
>                 Key: SAMZA-111
>                 URL: https://issues.apache.org/jira/browse/SAMZA-111
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Chris Riccomini
>            Assignee: Chris Riccomini
>
> We have been seeing very slow processing speed when running a Samza container 
> that consumes from 1000s of partitions. We don't see a corresponding slow 
> speed when running the same code, but with fewer input partitions (say 8-24).
> The messages per second seems to drop off as more partitions are added to the 
> Samza container. One Samza job has ~2500 partitions, and is seeing only 6000 
> messages/sec. The same code running with ~9 partitions is seeing 30,000 
> messages/sec.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (SAMZA-111) SystemConsumers is slow with large partition count

Reply via email to