[ 
https://issues.apache.org/jira/browse/KUDU-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297650#comment-15297650
 ] 

Todd Lipcon commented on KUDU-1465:
-----------------------------------

In my test setup with ~70 tablet servers and a 50-thread YCSB client on each, 
running workload C (100% random-read) across a 1B-row table, changing the batch 
size from the default 1MB down to 4KB increased the read throughput from 670K 
ops/sec to 1044K ops/sec (56% improvement). This test was running with 
jemalloc, but I imagine this would help tcmalloc as well.

> Large allocations for scanner result buffers harm allocator thread caching
> --------------------------------------------------------------------------
>
>                 Key: KUDU-1465
>                 URL: https://issues.apache.org/jira/browse/KUDU-1465
>             Project: Kudu
>          Issue Type: Bug
>          Components: perf
>    Affects Versions: 0.8.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> I was looking at the performance of a random-read stress test on a 70 node 
> cluster and found that threads were often spending time in allocator 
> contention, particularly when deallocating RpcSidecar objects. After a bit of 
> analysis, I determined this is because we always preallocate buffers of 1MB 
> (the default batch size) even if the response is only going to be a single 
> row. Such large allocations go directly to the central freelist instead of 
> using thread-local caches.
> As a simple test, I used the set_flag command to drop the default batch size 
> to 4KB, and the read throughput (reads/second) increased substantially.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to