[ https://issues.apache.org/jira/browse/HBASE-17653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872975#comment-15872975 ]
Hudson commented on HBASE-17653: -------------------------------- SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2524 (See [https://builds.apache.org/job/HBase-Trunk_matrix/2524/]) HBASE-17653 HBASE-17624 rsgroup synchronizations will (distributed) (stack: rev b392de3e315aa260e2825484e418701919eb7622) * (edit) hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java * (edit) hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsOfflineMode.java * (edit) hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java * (edit) hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java * (edit) hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroups.java > HBASE-17624 rsgroup synchronizations will (distributed) deadlock > ---------------------------------------------------------------- > > Key: HBASE-17653 > URL: https://issues.apache.org/jira/browse/HBASE-17653 > Project: HBase > Issue Type: Bug > Components: rsgroup > Reporter: stack > Assignee: stack > Fix For: 2.0.0 > > Attachments: HBASE-17653.master.001.patch, > HBASE-17653.master.002.patch, HBASE-17653.master.003.patch > > > Follow-on from HBASE-17624. HBASE-17624 made it so one thread only has access > to the rsgroup administrator. In tail of HBASE-17624 [~toffer] describes > scenario under which we may end up in a deadlock (distributed). Let me > repeat [~toffer] comment... > {code} > Both read/write access can't be single threaded. Consider the situation: > 1. move_rsgroup_servers is called > 2. while #1 is happening rsgroup region is in transition (rpc thread in #1 > holds monitor lock) > 3. while #2 is happening meta is in transition. > Balancer tries to figure out plan for meta region tries to get monitor lock > but can't. rpc thread task won't release monitor lock since rsgroup region > never gets assigned. rsgroup region never gets assigned because it can't > update meta with new state. > There's a good chance this can be reproduce just by moving both rsgroup and > meta region onto the same RS and call move_rsgoup_servers on the same RS. > A bunch different actors will query from group affiliation so we can't have > writes block reads. > .... > In the code prior to this patch the getter methods that retrieve group > information (getRSGroup, ofTable, OfServer, etc) don't require the monitor > lock so the deadlock cycle is broken. > .... > The methods that does mutations and updates to zk and hbase:rsgroup are > synchronized appropriately. Point me to where the incoherence is? > {code} > This issue is about testing/fixing/restoring rsgroup access. Will be back. -- This message was sent by Atlassian JIRA (v6.3.15#6346)