Also I honestly did not try to optimize the throughput/latency at all. My goal was to profile it in comparison to the default zeromq implementation to be sure that there was no regression.
—Bobby On 4/1/14, 8:17 AM, "Sean Zhong" <[email protected]> wrote: >I did some experiment, and are able to double the max throughput for small >message (100 bytes) by changing the netty glue code. But even after that, >there are still scability problem, the resource cannot be fully used. >I realized that in your test, you only have 8 core CPU, so for you CPU is >a >bottleneck. While for me, I have 32core(virtual 64) core CPU, so this >problem exposed. > >You are right, it is a "latency vs throughput" problem. Especially if we >buffer too much, it is possible that it can cache all spout.max.pending, >and there will no traffic and the topology will just wait for nothing. I >am still doing experiment to make a better balance between "latency vs >throughput" > > >Sean > > >On Tue, Apr 1, 2014 at 4:16 AM, Bobby Evans <[email protected]> wrote: > >> You are correct that we do not send a new batch of messages until the >> current batch has been acked. It should not be too difficult to switch >>to >> pipelining the messages so more then one batch is in flight at any point >> in time, but we wanted to get accuracy before digging more deeply into >> performance. >> >> As for the fixed batch size that is a latency vs throughput question, >>and >> is likely to vary depending on the use case you have. >> >> The bigger problem that I have seen is with the number of threads that >> Netty is using for larger topologies. I think we have a fix for that, >>but >> Andy and I have not had the time to put together a patch for the >>community >> yet. I will try to get to it this week. >> >> ‹Bobby >> >> On 3/26/14, 6:27 AM, "Sean Zhong" <[email protected]> wrote: >> >> >When running benchmark developed by Bobby( >> >>>http://yahooeng.tumblr.com/post/64758709722/making-storm-fly-with-netty) >>>, >> >I found neither the CPU, memory, network can be satured when the >>message >> >size is small(10bytes - 100 bytes). >> > >> > >> >message sizespout throughput (MB/s) 103207 4016.25 8032.88100 >> >43.1320080.13 >> >400138 800186.381000 196.3810000234.75 >> > >> >I have 4 nodes, each node have very powerful CPU, E5-2680(32 cores). >>The >> >throughput reachs peak when only 30% CPU of each machine is used, and >>only >> >1/6 of network bandwidth is used. >> > >> >So I guess this may relate to netty performance. >> > >> >My questions: >> >1. Seems we are using synchronized way to transfer message in netty >>client >> >worker, We are sending message only after we receive response of last >> >message request from the netty server, can this hurt performance? >> >2. Although we have batched the message when sending it through netty >> >channel.send, but the batch size varies. In my test, I found the batch >> >size >> >varies from tens of bytes to a few KB. Will a bigger and constant batch >> >size help here? >> > >> > >> >The following part are the steps I tried to trouble shooting the >>problem. >> >---------------------------------------------------------------- >> >1. Considering the CPU is not fully usd, I tried to scale out by adding >> >more workers or increasing the parallelism, but the throughput doesn't >> >improve. >> > >> >2. By checking profiling tool like visualvm, I found the spout/bolt >>only >> >have 60% - 70% time waiting, blocked on disruptor queue, spout spends >>70% >> >sleeping, acker spends 40% time waiting, while Netty boss and worker, >>and >> >zookeeper threads threads are busy. >> > >> >3. I have tried to tune all possible enumations of spout.max.pending, >> >transfer.size, receiver.size, executor.input.size, >>executor.output.size, >> > but it doesn't works out. >> > >> > >> >Sean >> >>
