[ 
https://issues.apache.org/jira/browse/KUDU-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255131#comment-15255131
 ] 

Todd Lipcon commented on KUDU-1235:
-----------------------------------

It looked like we were spending most of our time in the RPC code rather than 
the Get() code, so I did some benchmarking today of the RPC system using 
rpc-bench. Using the synchronous clients talking to localhost, I wasn't able to 
get more than 300K RPCs/sec or so on a single machine. When I switched to the 
async RPC client, I was able to drive a lot more throughput -- about 500K/sec. 
Looking at the perf results, though, I see a lot of contention on the 
service_queue mutex. So, I changed the server side to just always response 
'SERVER_TOO_BUSY' to see the cost of the service queue and handler context 
switches. With that change, I was able to drive around 3M+ per second.

So, some takeaways:
- the benchmark we've done so far using a small number of threads is 
insufficient to keep the server busy. When the server isn't busy, it causes the 
reactors to continually context switch, which actually makes them show up 
higher in the profile and waste CPU. Even though they appear to be "100% CPU", 
they actually can process more throughput if the clients provide it. Basically 
this is due to batching - there's some fixed cost to an epoll wakeup, and if 
more requests are arriving, then the reactor can get more work done from a 
single wakeup.
- If we increase the RPC load (eg by using an async client), the service queue 
lock is relatively costly when dealing with high RPC request rate. We should 
look into lock-free MPMC queues if this is a bottleneck for the Get case.

In terms of generating more load, maybe we should have an async get API? I 
think that also makes sense in order to implement features like multiget which 
are quite commonly used.


> Add Get API
> -----------
>
>                 Key: KUDU-1235
>                 URL: https://issues.apache.org/jira/browse/KUDU-1235
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>         Attachments: perf-get.svg, perf-scan-opt.svg, perf-scan.svg
>
>
> Get API is more user friendly and efficient if use just want primary key 
> lookup.
> I setup a cluster and test get/scan single row using ycsb, initial test shows 
> better performance for get.
> {noformat}
> kudu_workload:
> recordcount=1000000
> operationcount=1000000
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=false
> readproportion=1
> updateproportion=0
> scanproportion=0
> insertproportion=0
> requestdistribution=uniform
> use_get_api=false
> load:
> ./bin/ycsb load kudu -P workloads/kudu_workload -p sync_ops=false -p 
> pre_split_num_tablets=1 -p table_name=ycsb_wiki_example -p 
> masterQuorum='c3-kudu-tst-st01.bj:32600' -threads 100
> read test:
> ./bin/ycsb run kudu -P workloads/kudu_workload -p 
> masterQuorum='c3-kudu-tst-st01.bj:32600' -threads 100
> {noformat}
> Get API:
> [OVERALL], RunTime(ms), 21304.0
> [OVERALL], Throughput(ops/sec), 46939.54187007135
> [CLEANUP], Operations, 100.0
> [CLEANUP], AverageLatency(us), 423.57
> [CLEANUP], MinLatency(us), 24.0
> [CLEANUP], MaxLatency(us), 19327.0
> [CLEANUP], 95thPercentileLatency(us), 52.0
> [CLEANUP], 99thPercentileLatency(us), 18815.0
> [READ], Operations, 1000000.0
> [READ], AverageLatency(us), 2065.185152
> [READ], MinLatency(us), 134.0
> [READ], MaxLatency(us), 92159.0
> [READ], 95thPercentileLatency(us), 2391.0
> [READ], 99thPercentileLatency(us), 6359.0
> [READ], Return=0, 1000000
> Scan API:
> [OVERALL], RunTime(ms), 38259.0
> [OVERALL], Throughput(ops/sec), 26137.6408165399
> [CLEANUP], Operations, 100.0
> [CLEANUP], AverageLatency(us), 47.32
> [CLEANUP], MinLatency(us), 16.0
> [CLEANUP], MaxLatency(us), 1837.0
> [CLEANUP], 95thPercentileLatency(us), 41.0
> [CLEANUP], 99thPercentileLatency(us), 158.0
> [READ], Operations, 1000000.0
> [READ], AverageLatency(us), 3595.825249
> [READ], MinLatency(us), 139.0
> [READ], MaxLatency(us), 3139583.0
> [READ], 95thPercentileLatency(us), 3775.0
> [READ], 99thPercentileLatency(us), 7659.0
> [READ], Return=0, 1000000



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to