[ 
https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287416#comment-13287416
 ] 

ramkrishna.s.vasudevan commented on HBASE-6012:
-----------------------------------------------

@Chunhui
Please check the issue HBASE-6147.  Along with the scenario mentioned there if 
we take the current patch here I feel the following problem can come
-> Because as per the current patch attached here ,we force the znode to 
OFFLINE state,  if there was any current assignment going on due to SSH in one 
of the RS that will get stopped because of znode version mismatch.
Internally that failure to OPEN regin will make the znode to FAILED_OPEN.  Now 
based on this master will again start a new assignment. So this along with the 
issue in HBASE-6147 may lead to double assignment.  So what we felt here is 
this patch should go along with some changes in HBASE-6147.  
Pls feel free to correct me if am wrong.
                
> AssignmentManager#asyncSetOfflineInZooKeeper wouldn't force node offline
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6012
>                 URL: https://issues.apache.org/jira/browse/HBASE-6012
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0
>
>         Attachments: HBASE-6012.patch
>
>
> As the javadoc of method and the log message
> {code}
> /**
>    * Set region as OFFLINED up in zookeeper asynchronously.
>    */
> boolean asyncSetOfflineInZooKeeper(
> ...
> master.abort("Unexpected ZK exception creating/setting node OFFLINE", e);
> ...
> }
> {code}
> I think AssignmentManager#asyncSetOfflineInZooKeeper should also force node 
> offline, just like AssignmentManager#setOfflineInZooKeeper do. Otherwise, it 
> may cause bulk assign failed which called this method.
> Error log on the master caused by the issue
> 2012-05-12 01:40:09,437 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=writetest,1YTQDPGLXBTICHOPQ6IL,1336590857771.674da422fc7cb9a7d42c74499ace1d93.
>  state=PENDING_CLOSE, ts=1336757876856 
> 2012-05-12 01:40:09,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:60000-0x23736bf74780082 Async create of unassigned node for 
> 674da422fc7cb9a7d42c74499ace1d93 with OFFLINE state 
> 2012-05-12 01:40:09,446 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback:
>  rc != 0 for /hbase-func1/unassigned/674da422fc7cb9a7d42c74499ace1d93 -- 
> retryable connectionloss -- FIX see 
> http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2 
> 2012-05-12 01:40:09,447 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Connectionloss writing unassigned at 
> /hbase-func1/unassigned/674da422fc7cb9a7d42c74499ace1d93, rc=-110 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to