[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

Bharath Vissapragada (Jira) Wed, 16 Jun 2021 14:21:05 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364520#comment-17364520
 ]


Bharath Vissapragada commented on HBASE-25998:
----------------------------------------------

Thanks [~apurtell] for trying out the patch (and review).

One interesting behavior here is that this big throughput difference is only 
obvious for Async WAL implementation, not clear to me why, perhaps there is a 
lot more contention in that implementation for some reason. I repeated the same 
set of tests in branch-1/master based FSHLog and the patch only performs 
slightly better (few single digit % points). This behavior was also confirmed 
in the YCSB runs on branch-1 (on a 3 node containerized EC2 cluster).

Without patch: branch-1/FSHLog (10M ingest only)
{noformat}
[OVERALL], RunTime(ms), 199938
[OVERALL], Throughput(ops/sec), 50015.50480649001
[TOTAL_GCS_PS_Scavenge], Count, 293
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 1222
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.611189468735308
[TOTAL_GCS_PS_MarkSweep], Count, 1
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 34
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.017005271634206603
[TOTAL_GCs], Count, 294
[TOTAL_GC_TIME], Time(ms), 1256
[TOTAL_GC_TIME_%], Time(%), 0.6281947403695145
[CLEANUP], Operations, 512
[CLEANUP], AverageLatency(us), 41.0234375
[CLEANUP], MinLatency(us), 0
[CLEANUP], MaxLatency(us), 18527
[CLEANUP], 95thPercentileLatency(us), 13
[CLEANUP], 99thPercentileLatency(us), 37
[INSERT], Operations, 10000000
[INSERT], AverageLatency(us), 5085.9494093
[INSERT], MinLatency(us), 1499
[INSERT], MaxLatency(us), 220927
[INSERT], 95thPercentileLatency(us), 6511
[INSERT], 99thPercentileLatency(us), 16655
[INSERT], Return=OK, 10000000
{noformat}
With patch: branch-1/FSHLog (10M ingest only)
{noformat}
[OVERALL], RunTime(ms), 195064
[OVERALL], Throughput(ops/sec), 51265.2257720543
[TOTAL_GCS_PS_Scavenge], Count, 284
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 1184
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.6069802731411229
[TOTAL_GCS_PS_MarkSweep], Count, 1
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 33
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.01691752450477792
[TOTAL_GCs], Count, 285
[TOTAL_GC_TIME], Time(ms), 1217
[TOTAL_GC_TIME_%], Time(%), 0.6238977976459008
[CLEANUP], Operations, 512
[CLEANUP], AverageLatency(us), 45.783203125
[CLEANUP], MinLatency(us), 1
[CLEANUP], MaxLatency(us), 20591
[CLEANUP], 95thPercentileLatency(us), 14
[CLEANUP], 99thPercentileLatency(us), 37
[INSERT], Operations, 10000000
[INSERT], AverageLatency(us), 4958.6662675
[INSERT], MinLatency(us), 1380
[INSERT], MaxLatency(us), 295935
[INSERT], 95thPercentileLatency(us), 6335
[INSERT], 99thPercentileLatency(us), 19071
[INSERT], Return=OK, 10000000
{noformat}
Unfortunately, the tooling I have does not support branch-2/master (yet) so 
that I can repeat this YCSB run for Async WAL implementation but if WALPE runs 
are any indication, we should be a good enough throughput improvement.

> Revisit synchronization in SyncFuture
> -------------------------------------
>
>                 Key: HBASE-25998
>                 URL: https://issues.apache.org/jira/browse/HBASE-25998
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver, wal
>    Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
>            Reporter: Bharath Vissapragada
>            Assignee: Bharath Vissapragada
>            Priority: Major
>         Attachments: monitor-overhead-1.png, monitor-overhead-2.png
>
>
> While working on HBASE-25984, I noticed some weird frames in the flame graphs 
> around monitor entry exit consuming a lot of CPU cycles (see attached 
> images). Noticed that the synchronization there is too coarse grained and 
> sometimes unnecessary. I did a simple patch that switched to a reentrant lock 
> based synchronization with condition variable rather than a busy wait and 
> that showed 70-80% increased throughput in WAL PE. Seems too good to be 
> true.. (more details in the comments).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-25998) Revisit synchronization in SyncFuture

Reply via email to