[ 
https://issues.apache.org/jira/browse/HBASE-8144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609876#comment-13609876
 ] 

ramkrishna.s.vasudevan commented on HBASE-8144:
-----------------------------------------------

Should we synchronize on failedOpenTracker where we update and remove this 
Concurrent hash map? Overall patch look very good.

[~jxiang]
Actually this scenario i have recently seen in 0.94 where Lars had shared me 
some logs where the region opening was failing because the Compression codec 
while trying to open the region on the RS side was not found.

So this change will atleast avoid the continuous rebouncing of assignment 
between master and RS.
HBASE-8049 is to do that.  After this patch i think we can make that issue to 
work like this,
In case of FAILED_OPEN- can we add the exception msg or the reason why it 
failed and add it in the znode so that once we complete the retrying we try to 
use that info and prompt the user about the problem.
Let me take up more on that JIRA.
Coming  back to this JIRA,
So once this retries are completed how do we again reassign the region?  Just 
in case.
                
> Limit number of attempts to assign a region
> -------------------------------------------
>
>                 Key: HBASE-8144
>                 URL: https://issues.apache.org/jira/browse/HBASE-8144
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>            Priority: Minor
>             Fix For: 0.95.0, 0.98.0
>
>         Attachments: trunk-8144.patch
>
>
> In sending a region open request to a region server, we make sure we try at 
> most some configured times.  However, once the request is accepted by the 
> region server, the region could go through this transition forever: 
> failed_open (in ZK) => closed => opening => failed_open (in ZK), assuming no 
> RPC/network issue.
> It will be good to break the loop and limit the number of tries and move the 
> region to failed_open state (will be introduced in HBASE-8137)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to