Yes, that is indeed the problem. It is caused by

1) HBase has a fixed number (by default 30) of RPC handlers (a reasonable
design choice)
2) RPC handlers block on HDFS reads (also a reasonable design choice)

As the system has a higher load of I/O intensive workloads, all RPC
handlers would be blocked and no progress can be made for requests that do
not require I/O.

However, increasing number of threads seems to be an incomplete solution --
you run into the same problem with higher load of I/O intensive workloads...



On Sat, Apr 1, 2017 at 3:47 PM, Enis Söztutar <[email protected]> wrote:

> I think the problem is that you ONLY have 30 "handler" threads (
> hbase.regionserver.handler.count). Handlers are the main thread pool that
> executes the RPC requests. When you do an IO bound requests, very likely
> all of the 30 threads are just blocked by the disk access, so that the
> total throughput drops.
>
> It is typical to run with 100-300 threads on the regionserver side,
> depending on your settings. You can use the "Debug dump" from the
> regionserver we UI or jstack to inspect what the "handler" threads are
> doing.
>
> Enis
>
> On Fri, Mar 31, 2017 at 7:57 PM, 杨苏立 Yang Su Li <[email protected]>
> wrote:
>
> > On Fri, Mar 31, 2017 at 9:39 PM, Ted Yu <[email protected]> wrote:
> >
> > > Can you tell us which release of hbase you used ?
> > >
> >
> > 2.0.0 Snapshot
> >
> > >
> > > Please describe values for the config parameters in hbase-site.xml
> > >
> > > The content of hbase-site.xml is shown below, but indeed this problem
> is
> > not sensitive to configuration -- we can reproduce the same problem with
> > different configurations, and across different hbase version.
> >
> >
> > > Do you have SSD(s) in your cluster ?
> > > If so and the mixed workload involves writes, have you taken a look at
> > > HBASE-12848
> > > ?
> > >
> > No, we don't use SSD (for hbase). And the workload does not involve
> writes
> > (even though workload with writes show similar behavior). I stated that
> > both clients are doing 1KB Gets.
> >
> > <configuration>
> >
> > <property>
> > <name>hbase-master</name>
> > <value>node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:60000
> </value>
> > </property>
> >
> > <property>
> > <name>hbase.rootdir</name>
> > <value>hdfs://
> > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase</value>
> > </property>
> >
> > <property>
> > <name>hbase.fs.tmp.dir</name>
> > <value>hdfs://
> > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase-staging
> > </value>
> > </property>
> >
> > <property>
> > <name>hbase.cluster.distributed</name>
> > <value>true</value>
> > </property>
> >
> > <property>
> > <name>hbase.zookeeper.property.dataDir</name>
> > <value>/tmp/zookeeper</value>
> > </property>
> >
> > <property>
> > <name>hbase.zookeeper.property.clientPort</name>
> > <value>2181</value>
> > </property>
> >
> > <property>
> > <name>hbase.zookeeper.quorum</name>
> > <value>node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us</value>
> > </property>
> >
> > <property>
> >     <name>hbase.ipc.server.read.threadpool.size</name>
> >     <value>10</value>
> > </property>
> >
> > <property>
> > <name>hbase.regionserver.handler.count</name>
> > <value>30</value>
> > </property>
> >
> > </configuration>
> >
> >
> >
> > >
> > > Cheers
> > >
> > > On Fri, Mar 31, 2017 at 7:29 PM, 杨苏立 Yang Su Li <[email protected]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > We found that when there is a mix of CPU-intensive and I/O intensive
> > > > workload, HBase seems to slow everything down to the disk throughput
> > > level.
> > > >
> > > > This is shown in the performance graph at
> > > > http://pages.cs.wisc.edu/~suli/blocking-orig.pdf : both client-1 and
> > > > client-2 are issuing 1KB Gets. From second 0 , both repeatedly
> access a
> > > > small set of data that is cachable and both get high throughput (~45k
> > > > ops/s). At second 60, client-1 switch to an I/O intensive workload
> and
> > > > begins to randomly access a large set of data (does not fit in
> cache).
> > > > *Both* client-1 and client-2's throughput drops to ~0.5K ops/s.
> > > >
> > > > Is this acceptable behavior for HBase or is it considered a bug or
> > > > performance drawback?
> > > > I can find an old JIRA entry about similar problems (
> > > > https://issues.apache.org/jira/browse/HBASE-8836), but that was
> never
> > > > resolved.
> > > >
> > > > Thanks.
> > > >
> > > > Suli
> > > >
> > > > --
> > > > Suli Yang
> > > >
> > > > Department of Physics
> > > > University of Wisconsin Madison
> > > >
> > > > 4257 Chamberlin Hall
> > > > Madison WI 53703
> > > >
> > >
> >
> >
> >
> > --
> > Suli Yang
> >
> > Department of Physics
> > University of Wisconsin Madison
> >
> > 4257 Chamberlin Hall
> > Madison WI 53703
> >
>



-- 
Suli Yang

Department of Physics
University of Wisconsin Madison

4257 Chamberlin Hall
Madison WI 53703

Reply via email to