[ 
https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463401#comment-13463401
 ] 

Jean-Daniel Cryans commented on HBASE-5844:
-------------------------------------------

One thing that worries about this patch is the situation where the pid file is 
gone and someone tries to start the region server. It happened to me a bunch of 
times. I tried it with you patch and since it removes ephemeral znode it 
_kills_ the region server that's already running and doesn't start a new one 
because the ports are already occupied.

I'm not sure if this is related to this patch, but we're now missing info when 
using the scripts. We used to have:

{noformat}
su-jdcryans-2:0.94 jdcryans$ ./bin/start-hbase.sh 
localhost: starting zookeeper, logging to 
/Users/jdcryans/Work/HBase/0.94/bin/../logs/hbase-jdcryans-zookeeper-h-25-185.sfo.stumble.net.out
starting master, logging to 
/Users/jdcryans/Work/HBase/0.94/bin/../logs/hbase-jdcryans-master-h-25-185.sfo.stumble.net.out
localhost: starting regionserver, logging to 
/Users/jdcryans/Work/HBase/0.94/bin/../logs/hbase-jdcryans-regionserver-h-25-185.sfo.stumble.net.out
{noformat}

Now we have:

{noformat}
su-jdcryans-2:trunk-commit jdcryans$ ./bin/start-hbase.sh 

su-jdcryans-2:trunk-commit jdcryans$ 
{noformat}
                
> Delete the region servers znode after a regions server crash
> ------------------------------------------------------------
>
>                 Key: HBASE-5844
>                 URL: https://issues.apache.org/jira/browse/HBASE-5844
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>             Fix For: 0.96.0
>
>         Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 
> 5844.v3.patch, 5844.v4.patch
>
>
> today, if the regions server crashes, its znode is not deleted in ZooKeeper. 
> So the recovery process will stop only after a timeout, usually 30s.
> By deleting the znode in start script, we remove this delay and the recovery 
> starts immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to