[jira] [Commented] (HBASE-4101) Regionserver Deadlock

ramkrishna.s.vasudevan (JIRA) Fri, 15 Jul 2011 04:36:29 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065868#comment-13065868
 ]


ramkrishna.s.vasudevan commented on HBASE-4101:
-----------------------------------------------

Before submiting the patch I would like to tell my analysis why the date object 
is the problem.


Memstoreflusher.flushOneForGlobalPressure() and 
MemstoreFlusher.reclaimMemStoreMemory() are the 
two api where the problem occurs.

As part of flushOneForGlobalPressure() flushRegions() gets called. Here 
{noformat}
lock.lock();
{noformat}
 is obtained.
Then the flow goes into server.compactSplitThread.requestCompaction().
Here the region is added into CompactionQueue. 
In the PriorityCompactionQueue.addToRegionsInQueue() api we try to log the 
newRequest.
This internally invokes the toString of the date object as in the log
{noformat}
public String toString() {
      return "regionName=" + r.getRegionNameAsString() +
        ", priority=" + p + ", date=" + date;
{noformat}

Internally the date object uses ResourceBundle where in the endLoading() api
{noformat}
        Thread me = Thread.currentThread();
        assert (underConstruction.get(constKey) == me);
        underConstruction.remove(constKey);
        synchronized (me) {
            me.notifyAll();
        }
{noformat} 
tries to get the current thread.(Here the MemStoreFlusher).
Now parallely the MemStoreFlusher.reclaimMemStoreMemory() is getting called 
which itself is synchronized.
So the other thread has obtained the MemStoreFlusher lock and waits to obtain 
the 
{noformat}
    if (isAboveHighWaterMark()) {
      lock.lock();
{noformat}

Whereas The ResourceBundle waits to get the MemStoreFlusher Lock.  So this is 
leading to a deadlock condition.

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the 
> process list, zookeeper thread sends the keepalive so the master won't remove 
> it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal 
> testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Reply via email to