I've been profiling HBase with my own benchmark but I'm having trouble
interpreting some of the results. Maybe someone can help shed some light.
The benchmark:
* Test table consists of 10 columns all of the same family. All values
are random selection from 10 city names.
* My read method scans all rows on 3 of those columns using HClient.
* I run HMaster and one HRegion in separate JVM's (for more controlled
profiling), both using local FS, not HDFS, all on a single machine.
* Each test is cold: run right after HRegion startup with nothing in
HMemCache.
* hbase-site.xml is attached
* profile *.cpu files are large, i'll send them individually if requested
Profiling:
I'm using YourKit 6.0.15 to profile just the CPU time of HRegionServer
running in its own JVM.
With a table of size 10k:
- Scan took 5220ms.
With a table of size 100k:
- Scan took 29366ms.
Obervations:
Within the combined RPC$server.call() of all Handler threads,
HRegionServer.next() took about 90% in both the 10K and 100K cases.
This is as expected since it's called 10K/100K times. The biggest
bottleneck bellow that is HStoreKey.extractFamily() which is 33%/39% of
HRegionServer.next() which is more expensive than all the IO operations
in HStoreScanner.getNext() (31%/31%). Couldn't there be a cheaper way to
extract family from a col key?
Outside the RPC$server.call(),
Server$Connection.readAndProcess() takes about 17%/22% of the time that
HRegionServer.next() takes suggesting a heavy overhead in RPC IO and
unmarshalling. Overhead in the other direction, ObjectWritable.write()
+ DataOutputStream.flush() take 63%/57% of the time that
HRegionServer.next() takes. The ratio of input/output makes sense since
more data is returned than sent. But this seems like a lot of RPC overhead.
I may be incorrect, but my understanding of YourKit (latest build) is
that it does not count the time that threads spend blocked. So it seems
quite a mystery to me that the following methods consume so much cpu
time (not counting methods called from within them).
Server$Handler.run() 68%
SelectorImpl.select() 7%
Relative to RPC$Server.call which is only 2%
In previous profiles i've observed that Server$Listener.run() consumed
most of the time instead of Server$Handler.run(). Both those thread
synchronize on the callQueue and I'm sure that synchronization has
something to do with this result. Unless i'm wrong about my YourKit
assumption, i don't see why sync blocking should show up like this. Does
anyone know?
As for SelectorImpl.select(), I looked it up but don't completely
understand what's happening under the hood. I know that it is blocking
too while waiting for a connection but I don't see why it should be
consuming so much CPU.
If anyone else has done YourKit profiling on HBase, please let me know
if you've found similar results or can help me clarify my results.
Right now, if i assume that YourKit is counting block time and I ignore
those numbers, I come to the following conclusions:
It looks like HStoreKey.extractFamily() is a bottleneck. For 100k rows
and 10 cols, it will be called over 1,000,000 times during a full table
scan (even if just retrieving one col) and each call requires multiple
byte[] <-> string conversions.
RPC overhead can be quite high during a scan. Could batched/read-ahead
scanning be useful to minimize RPC calls?
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.master</name>
<value>172.30.20.15:60000</value>
<description>The host and port that the HBase master runs at.
TODO: Support 'local' (All running in single context).
</description>
</property>
<property>
<name>hbase.regionserver</name>
<value>172.30.20.15:60010</value>
<description>The host and port a HBase region server runs at.
</description>
</property>
<property>
<name>hbase.regiondir</name>
<value>hbase</value>
<description>The directory shared by region servers.
</description>
</property>
<property>
<name>hbase.hregion.max.filesize</name>
<value>134217728</value>
<description>
Maximum desired file size for an HRegion. If filesize exceeds
value + (value / 2), the HRegion is split in two. Default: 128M (134217728).
</description>
</property>
<property>
<name>hbase.hregion.compactionThreshold</name>
<value>10</value>
<description>
By default, we compact the region if an HStore has more than 10 map files.
</description>
</property>
<property>
<name>hbase.regionserver.thread.splitcompactcheckfrequency</name>
<value>40000</value>
<description>
Default: 60 * 1000
</description>
</property>
<property>
<name>hbase.client.pause</name>
<value>2000</value>
</property>
<property>
<name>hbase.client.timeout.number</name>
<value>30000</value>
<description>Try this many timeouts before giving up.
</description>
</property>
<property>
<name>hbase.client.retries.number</name>
<value>15</value>
<description>Count of maximum retries fetching the root region from root
region server.
</description>
</property>
<property>
<name>hbase.master.meta.thread.rescanfrequency</name>
<value>60000</value>
<description>How long the HMaster sleeps (in milliseconds) between scans of
the root and meta tables.
</description>
</property>
<property>
<name>hbase.master.lease.period</name>
<value>30000</value>
<description>HMaster server lease period in milliseconds. Default is
30 seconds.</description>
</property>
<property>
<name>hbase.server.thread.wakefrequency</name>
<value>5000</value>
<description>Time to sleep in between searches for work (in milliseconds).
Used as sleep interval by service threads such as META scanner and log roller.
</description>
</property>
<property>
<name>hbase.regionserver.lease.period</name>
<value>30000</value>
<description>HRegion server lease period in milliseconds. Default is
30 seconds.</description>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>10</value>
<description>Count of RPC Server instances spun up on RegionServers
Same property is used by the HMaster for count of master handlers.
Default is 10.
</description>
</property>
<property>
<name>hbase.regionserver.msginterval</name>
<value>500</value>
<description>Interval between messages from the RegionServer to HMaster
in milliseconds. Default is 15 sec. Set this value low if you want unit
tests to be responsive.
</description>
</property>
</configuration>