I think the problem is that you ONLY have 30 "handler" threads ( hbase.regionserver.handler.count). Handlers are the main thread pool that executes the RPC requests. When you do an IO bound requests, very likely all of the 30 threads are just blocked by the disk access, so that the total throughput drops.
It is typical to run with 100-300 threads on the regionserver side, depending on your settings. You can use the "Debug dump" from the regionserver we UI or jstack to inspect what the "handler" threads are doing. Enis On Fri, Mar 31, 2017 at 7:57 PM, 杨苏立 Yang Su Li <[email protected]> wrote: > On Fri, Mar 31, 2017 at 9:39 PM, Ted Yu <[email protected]> wrote: > > > Can you tell us which release of hbase you used ? > > > > 2.0.0 Snapshot > > > > > Please describe values for the config parameters in hbase-site.xml > > > > The content of hbase-site.xml is shown below, but indeed this problem is > not sensitive to configuration -- we can reproduce the same problem with > different configurations, and across different hbase version. > > > > Do you have SSD(s) in your cluster ? > > If so and the mixed workload involves writes, have you taken a look at > > HBASE-12848 > > ? > > > No, we don't use SSD (for hbase). And the workload does not involve writes > (even though workload with writes show similar behavior). I stated that > both clients are doing 1KB Gets. > > <configuration> > > <property> > <name>hbase-master</name> > <value>node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:60000</value> > </property> > > <property> > <name>hbase.rootdir</name> > <value>hdfs:// > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase</value> > </property> > > <property> > <name>hbase.fs.tmp.dir</name> > <value>hdfs:// > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase-staging > </value> > </property> > > <property> > <name>hbase.cluster.distributed</name> > <value>true</value> > </property> > > <property> > <name>hbase.zookeeper.property.dataDir</name> > <value>/tmp/zookeeper</value> > </property> > > <property> > <name>hbase.zookeeper.property.clientPort</name> > <value>2181</value> > </property> > > <property> > <name>hbase.zookeeper.quorum</name> > <value>node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us</value> > </property> > > <property> > <name>hbase.ipc.server.read.threadpool.size</name> > <value>10</value> > </property> > > <property> > <name>hbase.regionserver.handler.count</name> > <value>30</value> > </property> > > </configuration> > > > > > > > Cheers > > > > On Fri, Mar 31, 2017 at 7:29 PM, 杨苏立 Yang Su Li <[email protected]> > > wrote: > > > > > Hi, > > > > > > We found that when there is a mix of CPU-intensive and I/O intensive > > > workload, HBase seems to slow everything down to the disk throughput > > level. > > > > > > This is shown in the performance graph at > > > http://pages.cs.wisc.edu/~suli/blocking-orig.pdf : both client-1 and > > > client-2 are issuing 1KB Gets. From second 0 , both repeatedly access a > > > small set of data that is cachable and both get high throughput (~45k > > > ops/s). At second 60, client-1 switch to an I/O intensive workload and > > > begins to randomly access a large set of data (does not fit in cache). > > > *Both* client-1 and client-2's throughput drops to ~0.5K ops/s. > > > > > > Is this acceptable behavior for HBase or is it considered a bug or > > > performance drawback? > > > I can find an old JIRA entry about similar problems ( > > > https://issues.apache.org/jira/browse/HBASE-8836), but that was never > > > resolved. > > > > > > Thanks. > > > > > > Suli > > > > > > -- > > > Suli Yang > > > > > > Department of Physics > > > University of Wisconsin Madison > > > > > > 4257 Chamberlin Hall > > > Madison WI 53703 > > > > > > > > > -- > Suli Yang > > Department of Physics > University of Wisconsin Madison > > 4257 Chamberlin Hall > Madison WI 53703 >
