[
https://issues.apache.org/jira/browse/HBASE-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010925#comment-13010925
]
Ted Yu commented on HBASE-3654:
-------------------------------
Subbu's patch only removed synchronization. But the above potential issue
existed before that change.
A bigger problem could be due to this change:
{code}
public int getNumberOfOnlineRegions() {
int size = -1;
- synchronized (this.onlineRegions) {
- size = this.onlineRegions.size();
+ size = this.onlineRegions.size();
- }
return size;
}
{code}
because it is used in:
{code}
public HRegionInfo[] getRegionsAssignment() throws IOException {
- synchronized (this.onlineRegions) {
- HRegionInfo [] regions = new HRegionInfo[getNumberOfOnlineRegions()];
- Iterator<HRegion> ite = onlineRegions.values().iterator();
- for (int i = 0; ite.hasNext(); i++) {
- regions[i] = ite.next().getRegionInfo();
- }
- return regions;
- }
+ HRegionInfo [] regions = new HRegionInfo[getNumberOfOnlineRegions()];
+ Iterator<HRegion> ite = onlineRegions.values().iterator();
+ for (int i = 0; ite.hasNext(); i++) {
+ regions[i] = ite.next().getRegionInfo();
+ }
{code}
Due to currently relaxed locking, regions array could be 1 (or more) elements
shorter than what onlineRegions.values() contains. This could lead to
ArrayIndexOutOfBoundsException.
We should at least add a check after iterator assignment that
this.onlineRegions.size() is the same as before iterator assignment.
> Weird blocking between getOnlineRegion and createRegionLoad
> -----------------------------------------------------------
>
> Key: HBASE-3654
> URL: https://issues.apache.org/jira/browse/HBASE-3654
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.1
> Reporter: Jean-Daniel Cryans
> Assignee: Subbu M Iyer
> Priority: Blocker
> Fix For: 0.90.2
>
> Attachments: ConcurrentHM, ConcurrentSKLM, CopyOnWrite,
> HBASE-3654-ConcurrentHashMap-RemoveGetSync.patch,
> HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_COWAL.patch,
>
> HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_COWAL1.patch,
>
> HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_ConcurrentHM.patch,
> TestOnlineRegions.java, hashmap
>
>
> Saw this when debugging something else:
> {code}
> "regionserver60020" prio=10 tid=0x00007f538c1c0000 nid=0x4c7 runnable
> [0x00007f53931da000]
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.hadoop.hbase.regionserver.Store.getStorefilesIndexSize(Store.java:1380)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:916)
> - locked <0x0000000672aa0a00> (a
> java.util.concurrent.ConcurrentSkipListMap)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:767)
> - locked <0x0000000656f62710> (a java.util.HashMap)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:722)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:591)
> at java.lang.Thread.run(Thread.java:662)
> "IPC Reader 9 on port 60020" prio=10 tid=0x00007f538c1be000 nid=0x4c6 waiting
> for monitor entry [0x00007f53932db000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295)
> - waiting to lock <0x0000000656f62710> (a java.util.HashMap)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
> - locked <0x0000000656e60068> (a
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> ...
> "IPC Reader 0 on port 60020" prio=10 tid=0x00007f538c08b000 nid=0x4bd waiting
> for monitor entry [0x00007f5393be4000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295)
> - waiting to lock <0x0000000656f62710> (a java.util.HashMap)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
> - locked <0x0000000656e635c8> (a
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> All the readers are blocked! I have the feeling something much better could
> be done.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira