[
https://issues.apache.org/jira/browse/HBASE-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009991#comment-13009991
]
Subbu M Iyer commented on HBASE-3654:
-------------------------------------
1. Created a dummy class with two API's. getOnlineRegions and
buildServerReport, which exactly
mimic our HREgionServer.getOnlineRegions and HRS.buildServerLoad.
(i.e, both operate on a Hashmap under a sync lock)
2. Created 20 threads with 10 threads hitting getOnlineRegions and 10 hitting
buildServerLoad in a
loops for 100 times. (just to simulate and recreate the locked reader's
scenario that JD reported)
3. Ran the test and captured the thread dump for the following scenarios with
onlineRegions
represented as:
a. HashMap, and Synchronized on HashMap (as it is today)
b. ConcurrentHashMap with no synchronization.
c. ConcurrentSkipListMap with no sync
d. CopyOnWriteList
4. I could reproduce the lock scenario that JD reported in all the scenarions
3a,3b,and 3c.
in case of 3c I do seeblocked threads waiting at
at java.util.concurrent.ConcurrentSkipListMap$EntryIterator.next and in case of
3b at
at
java.util.concurrent.ConcurrentHashMap$EntryIterator.next(ConcurrentHashMap.java:1163)
and case of 3d has no blocked thread except for one thread
blocked at at java.util.Arrays.copyOf(Arrays.java:2760) during
getOnlineRegions call.
5. I have attached all the thread dumps for your review.
> Weird blocking between getOnlineRegion and createRegionLoad
> -----------------------------------------------------------
>
> Key: HBASE-3654
> URL: https://issues.apache.org/jira/browse/HBASE-3654
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.1
> Reporter: Jean-Daniel Cryans
> Priority: Blocker
> Fix For: 0.90.2
>
>
> Saw this when debugging something else:
> {code}
> "regionserver60020" prio=10 tid=0x00007f538c1c0000 nid=0x4c7 runnable
> [0x00007f53931da000]
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.hadoop.hbase.regionserver.Store.getStorefilesIndexSize(Store.java:1380)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:916)
> - locked <0x0000000672aa0a00> (a
> java.util.concurrent.ConcurrentSkipListMap)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:767)
> - locked <0x0000000656f62710> (a java.util.HashMap)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:722)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:591)
> at java.lang.Thread.run(Thread.java:662)
> "IPC Reader 9 on port 60020" prio=10 tid=0x00007f538c1be000 nid=0x4c6 waiting
> for monitor entry [0x00007f53932db000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295)
> - waiting to lock <0x0000000656f62710> (a java.util.HashMap)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
> - locked <0x0000000656e60068> (a
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> ...
> "IPC Reader 0 on port 60020" prio=10 tid=0x00007f538c08b000 nid=0x4bd waiting
> for monitor entry [0x00007f5393be4000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295)
> - waiting to lock <0x0000000656f62710> (a java.util.HashMap)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
> - locked <0x0000000656e635c8> (a
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> All the readers are blocked! I have the feeling something much better could
> be done.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira