When running benchmark developed by Bobby( http://yahooeng.tumblr.com/post/64758709722/making-storm-fly-with-netty), I found neither the CPU, memory, network can be satured when the message size is small(10bytes - 100 bytes).
message sizespout throughput (MB/s) 103207 4016.25 8032.88100 43.1320080.13 400138 800186.381000 196.3810000234.75 I have 4 nodes, each node have very powerful CPU, E5-2680(32 cores). The throughput reachs peak when only 30% CPU of each machine is used, and only 1/6 of network bandwidth is used. So I guess this may relate to netty performance. My questions: 1. Seems we are using synchronized way to transfer message in netty client worker, We are sending message only after we receive response of last message request from the netty server, can this hurt performance? 2. Although we have batched the message when sending it through netty channel.send, but the batch size varies. In my test, I found the batch size varies from tens of bytes to a few KB. Will a bigger and constant batch size help here? The following part are the steps I tried to trouble shooting the problem. ---------------------------------------------------------------- 1. Considering the CPU is not fully usd, I tried to scale out by adding more workers or increasing the parallelism, but the throughput doesn't improve. 2. By checking profiling tool like visualvm, I found the spout/bolt only have 60% - 70% time waiting, blocked on disruptor queue, spout spends 70% sleeping, acker spends 40% time waiting, while Netty boss and worker, and zookeeper threads threads are busy. 3. I have tried to tune all possible enumations of spout.max.pending, transfer.size, receiver.size, executor.input.size, executor.output.size, but it doesn't works out. Sean
