[ 
https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238328#comment-13238328
 ] 

Uma Maheswara Rao G commented on HBASE-5635:
--------------------------------------------

Yes, I think, continuing without SplitLogWroker may not be a good behaviour.
Because that particular regionServer may have more capacity to take up the new 
regions. With the current behaviour it may not compete for taking any new 
splilog work.

I feel we can retry for some times and then we can shutdown regionServer?
or other option is to retry forever on any ZK exception. And can exit only on 
interrupted exception.

Also i am seeing this issue may be bit dangerous bacause, if ZK is not 
available for some time, all RegionServer may face this problem and no one will 
take up the splitlog work.

listChildrenAndWatchForNewChildren will return null only if node does not 
exist. If it is not able to find any children then it will return empty list. 
So, zookeeper.znode.splitlog will be always set.

On Other keeperExceptions like ZK unavalability and all, we have to handle.
                
> If getTaskList() returns null splitlogWorker is down. It wont serve any 
> requests. 
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5635
>                 URL: https://issues.apache.org/jira/browse/HBASE-5635
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.92.1
>            Reporter: Kristam Subba Swathi
>
> During the hlog split operation if all the zookeepers are down ,then the 
> paths will be returned as null and the splitworker thread wil be exited
> Now this regionserver wil not be able to acquire any other tasks since the 
> splitworker thread is exited
> Please find the attached code for more details
> ------------------------------------------
> private List<String> getTaskList() {
>     for (int i = 0; i < zkretries; i++) {
>       try {
>         return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher,
>             this.watcher.splitLogZNode));
>       } catch (KeeperException e) {
>         LOG.warn("Could not get children of znode " +
>             this.watcher.splitLogZNode, e);
>         try {
>           Thread.sleep(1000);
>         } catch (InterruptedException e1) {
>           LOG.warn("Interrupted while trying to get task list ...", e1);
>           Thread.currentThread().interrupt();
>           return null;
>         }
>       }
>     }
> in the org.apache.hadoop.hbase.regionserver.SplitLogWorker 
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to