[jira] [Commented] (GEODE-6191) Investigate scaleability of benchmarks for different numbers of threads

Brian Rowe (JIRA) Thu, 20 Dec 2018 16:28:13 -0800


    [ 
https://issues.apache.org/jira/browse/GEODE-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726347#comment-16726347
 ]


Brian Rowe commented on GEODE-6191:
-----------------------------------

I've run the benchmarks many more times using c5.9xlarge AWS instances with 
thread counts ranging from 4-256.  The results can be seen in the linked 
spreadsheet here: 
https://docs.google.com/spreadsheets/d/15h0MeOFdIkxToJFSHKz2DywdT7U0AohYz3I-YvR6k1o/edit?usp=sharing

In looking at the stat files for the 32 vs 64 threads for the 
PartitionedPutBenchmark, we noticed that the extra time per op seemed to be 
coming 50% from the client side handling (CachePerfStats.putTime - 
PoolStats.clientOpTIme), 25% from the server side (CacheServerStats.putTime), 
and 25% from the network transit time (PoolStats.clientOpTime - 
CacheServerStats.putTime).  The math can be seen on sheet 2 of the above link.

After noting this we tried running with setPoolThreadLocalConnections(true) on 
the client.  With this toggled, we noticed improved scaling as threads 
increased (in fact we didn't seem to hit a hard ceiling within 256 threads), 
but it is still far from linear.  At 256 threads, each thread was only about 
25% as performant as at 32 threads.  Looking at the same stats as above, we did 
see that this change had more or less eliminated the per thread increase in 
client side handling time.


> Investigate scaleability of benchmarks for different numbers of threads
> -----------------------------------------------------------------------
>
>                 Key: GEODE-6191
>                 URL: https://issues.apache.org/jira/browse/GEODE-6191
>             Project: Geode
>          Issue Type: Task
>          Components: benchmarks
>            Reporter: Dan Smith
>            Assignee: Brian Rowe
>            Priority: Major
>
> We should expect to see benchmark throughput scale linearly with the number 
> of threads, up to the point where we start hitting either CPU or network 
> limitations. If we do not scale, that indicates that either something in the 
> benchmark framework or Geode itself is limiting us.
> In a couple of runs in google cloud with 48 threads vs 192 threads on 4 96 
> CPU instances, we observed almost the same throughput (but which much higher 
> latency) with 192 threads. CPU and network stats did not indicate full 
> utilization.
> We should check the scaleability of these tests again after GEODE-6172 and 
> GEODE-6148 are implemented. Try running the tests with increasing numbers of 
> threads (eg 4,16,32,64,128,256,512, etc.) in AWS on c5.9xlarge instances and 
> see when we stop scaling linearly and why.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GEODE-6191) Investigate scaleability of benchmarks for different numbers of threads

Reply via email to