Also I honestly did not try to optimize the throughput/latency at all.  My
goal was to profile it in comparison to the default zeromq implementation
to be sure that there was no regression.

—Bobby

On 4/1/14, 8:17 AM, "Sean Zhong" <[email protected]> wrote:

>I did some experiment, and are able to double the max throughput for small
>message (100 bytes) by changing the netty glue code. But even after that,
>there are still scability problem, the resource cannot be fully used.
>I realized that in your test, you only have 8 core CPU, so for you CPU is
>a
>bottleneck. While for me, I have 32core(virtual 64) core CPU, so this
>problem exposed.
>
>You are right, it is a "latency vs throughput" problem. Especially if we
>buffer too much, it is possible that it can cache all spout.max.pending,
>and there will no traffic and the  topology will just wait for nothing. I
>am still doing experiment to make a better balance between "latency vs
>throughput"
>
>
>Sean
>
>
>On Tue, Apr 1, 2014 at 4:16 AM, Bobby Evans <[email protected]> wrote:
>
>> You are correct that we do not send a new batch of messages until the
>> current batch has been acked.  It should not be too difficult to switch
>>to
>> pipelining the messages so more then one batch is in flight at any point
>> in time, but we wanted to get accuracy before digging more deeply into
>> performance.
>>
>> As for the fixed batch size that is a latency vs throughput question,
>>and
>> is likely to vary depending on the use case you have.
>>
>> The bigger problem that I have seen is with the number of threads that
>> Netty is using for larger topologies.  I think we have a fix for that,
>>but
>> Andy and I have not had the time to put together a patch for the
>>community
>> yet.  I will try to get to it this week.
>>
>> ‹Bobby
>>
>> On 3/26/14, 6:27 AM, "Sean Zhong" <[email protected]> wrote:
>>
>> >When running benchmark developed by Bobby(
>> 
>>>http://yahooeng.tumblr.com/post/64758709722/making-storm-fly-with-netty)
>>>,
>> >I found neither the CPU, memory, network can be satured when the
>>message
>> >size is small(10bytes - 100 bytes).
>> >
>> >
>> >message sizespout throughput (MB/s) 103207 4016.25 8032.88100
>> >43.1320080.13
>> >400138 800186.381000 196.3810000234.75
>> >
>> >I have 4 nodes, each node have very powerful CPU, E5-2680(32 cores).
>>The
>> >throughput reachs peak when only 30% CPU of each machine is used, and
>>only
>> >1/6 of network bandwidth is used.
>> >
>> >So I guess this may relate to netty performance.
>> >
>> >My questions:
>> >1. Seems we are using synchronized way to transfer message in netty
>>client
>> >worker, We are sending message only after we receive response of last
>> >message request from the netty server, can this hurt performance?
>> >2. Although we have batched the message when sending it through netty
>> >channel.send, but the batch size varies. In my test, I found the batch
>> >size
>> >varies from tens of bytes to a few KB. Will a bigger and constant batch
>> >size help here?
>> >
>> >
>> >The following part are the steps I tried to trouble shooting the
>>problem.
>> >----------------------------------------------------------------
>> >1. Considering the CPU is not fully usd, I tried to scale out by adding
>> >more workers or increasing the parallelism, but the throughput doesn't
>> >improve.
>> >
>> >2. By checking profiling tool like visualvm, I found the spout/bolt
>>only
>> >have 60% - 70% time waiting, blocked on disruptor queue, spout spends
>>70%
>> >sleeping, acker spends 40% time waiting, while Netty boss and worker,
>>and
>> >zookeeper threads threads are busy.
>> >
>> >3. I have tried to tune all possible enumations of spout.max.pending,
>> >transfer.size, receiver.size, executor.input.size,
>>executor.output.size,
>> > but it doesn't works out.
>> >
>> >
>> >Sean
>>
>>

Reply via email to