[ 
https://issues.apache.org/jira/browse/HBASE-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620054#comment-13620054
 ] 

Elliott Clark commented on HBASE-7871:
--------------------------------------

getMetrics is called from a thread spawned by the Hadoop metrics system.  The 
hadoop metrics system calls getMetrics to copy all of the values that a Source 
has.  It's always a thread outside of the control of HBase.

* Anything we do in hadoop1-compat will probably have to be done in hadoop 2.
** 
https://github.com/apache/hbase/blob/trunk/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java#L49
** 
https://github.com/apache/hbase/blob/trunk/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java#L50
* I think we should go with reader writer locks inside of the Aggregate source, 
where the actual manipulation of the tree map happens.
** getMetrics takes a reader lock
** Adding or removing region sources would take a writer lock
                
> HBase can be stuck when closing regions concurrently 
> -----------------------------------------------------
>
>                 Key: HBASE-7871
>                 URL: https://issues.apache.org/jira/browse/HBASE-7871
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: Nicolas Liochon
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.95.0, 0.98.0
>
>         Attachments: 7871.patch, 7871-v2.patch, s1.txt, TestStartStop.java
>
>
> The attached test fails ~1% of the the time on 0.96. It seems it does not 
> fail on 0.94.5. It's simple: a table creation and some puts.
> I attach the stack. Logs says nothing it seems.
> The suspicious part is:
> {noformat}
> "RS_CLOSE_REGION-localhost,57575,1361197489166-2" prio=10 
> tid=0x00007fb0c8775800 nid=0x61ac runnable [0x00007fb09f272000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.TreeMap.fixAfterDeletion(TreeMap.java:2193)
>         at java.util.TreeMap.deleteEntry(TreeMap.java:2151)
>         at java.util.TreeMap.remove(TreeMap.java:585)
>         at java.util.TreeSet.remove(TreeSet.java:259)
>         at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:55)
>         at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:86)
>         at 
> org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:40)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1063)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:969)
>         - locked <0x00000006944e2558> (a java.lang.Object)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:146)
>         at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:203)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to