[jira] [Commented] (HBASE-17653) HBASE-17624 rsgroup synchronizations will (distributed) deadlock

Hudson (JIRA) Fri, 17 Feb 2017 19:57:06 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-17653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872975#comment-15872975
 ]


Hudson commented on HBASE-17653:
--------------------------------

SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2524 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2524/])
HBASE-17653 HBASE-17624 rsgroup synchronizations will (distributed) (stack: rev 
b392de3e315aa260e2825484e418701919eb7622)
* (edit) 
hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java
* (edit) 
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsOfflineMode.java
* (edit) 
hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java
* (edit) 
hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java
* (edit) 
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroups.java


> HBASE-17624 rsgroup synchronizations will (distributed) deadlock
> ----------------------------------------------------------------
>
>                 Key: HBASE-17653
>                 URL: https://issues.apache.org/jira/browse/HBASE-17653
>             Project: HBase
>          Issue Type: Bug
>          Components: rsgroup
>            Reporter: stack
>            Assignee: stack
>             Fix For: 2.0.0
>
>         Attachments: HBASE-17653.master.001.patch, 
> HBASE-17653.master.002.patch, HBASE-17653.master.003.patch
>
>
> Follow-on from HBASE-17624. HBASE-17624 made it so one thread only has access 
> to the rsgroup administrator. In tail of HBASE-17624 [~toffer] describes 
> scenario under which we  may end up in a deadlock (distributed). Let me 
> repeat [~toffer] comment...
> {code}
> Both read/write access can't be single threaded. Consider the situation:
> 1. move_rsgroup_servers is called
> 2. while #1 is happening rsgroup region is in transition (rpc thread in #1 
> holds monitor lock)
> 3. while #2 is happening meta is in transition.
> Balancer tries to figure out plan for meta region tries to get monitor lock 
> but can't. rpc thread task won't release monitor lock since rsgroup region 
> never gets assigned. rsgroup region never gets assigned because it can't 
> update meta with new state.
> There's a good chance this can be reproduce just by moving both rsgroup and 
> meta region onto the same RS and call move_rsgoup_servers on the same RS.
> A bunch different actors will query from group affiliation so we can't have 
> writes block reads.
> ....
> In the code prior to this patch the getter methods that retrieve group 
> information (getRSGroup, ofTable, OfServer, etc) don't require the monitor 
> lock so the deadlock cycle is broken.
> ....
> The methods that does mutations and updates to zk and hbase:rsgroup are 
> synchronized appropriately. Point me to where the incoherence is?
> {code}
> This issue is about testing/fixing/restoring rsgroup access. Will be back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17653) HBASE-17624 rsgroup synchronizations will (distributed) deadlock

Reply via email to