Bryan Beaudreault created HBASE-26739:
-----------------------------------------
Summary: Performance regression in asyncwal
Key: HBASE-26739
URL: https://issues.apache.org/jira/browse/HBASE-26739
Project: HBase
Issue Type: Bug
Affects Versions: 2.4.6
Reporter: Bryan Beaudreault
I've been doing load testing of hbase2, using hadoop 3.3.1 (client and server).
Comparing the results to an identical cluster running hbase/hadoop cdh5.16.2
(hbase ~1.2.0, hadoop ~2.6.0). With a heavy write workload I would consistently
see 99th percentile sync times of 5-6ms in cdh5, but 20-30ms in hbase2.
Configuring hbase2 to use 'filesystem' wal provider mostly resolves this
regression, resulting in latencies of 6-8ms. So still a little slower than
cdh5, but much more reasonable.
Unfortunately I don't have a lot of flexibility in my environment to try
various versions of hadoop. I didn't notice any exceptions or anything to
indicate an API problem with hadoop 3.3.1, this may just be a general
regression in the 2.x branch.
I briefly tried profiling wall clock time and the process was dominated by
waiting in SyncFuture.get. I haven't dug deep enough into the code yet to know
how to identify a bottleneck in whatever threads are responsible for completing
those futures.
One thing to note, I tried enabling hbase.wal.async.use-shared-event-loop but
noticed no difference.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)