[ 
https://issues.apache.org/jira/browse/HBASE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-3442.
-----------------------------------

    Resolution: Invalid

Issue wasn't actionable

> Master failing when node disconnects or dies
> --------------------------------------------
>
>                 Key: HBASE-3442
>                 URL: https://issues.apache.org/jira/browse/HBASE-3442
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.90.0
>         Environment: CentOS 5, Hbase .90 RC3, Amazon EC2
>            Reporter: Justin
>            Priority: Minor
>
> We've got our servers running on Amazon EC2 and nodes will go through some 
> shutdown scripts if/when we want to take them out of the mix.  Ended up 
> shutting down one of the nodes, in this case Node98, which cased the 
> immediate crash of the master server.  Upon restarting the master, it would 
> attempt to contact the missing node, and then stop it's startup process.  I 
> believe the node removed itself from the DNS server first, then ran a stop on 
> the datanode, and regionserver.  The missing node was also removed from any 
> slave/regionserver list on the master server.  I finally put in a bogus entry 
> in the /etc/hosts file for the missing node, pointing it back to 127.0.0.1, 
> and the master server finally marked it as a dead node, ignored it, and 
> finished the startup process.
> Going to try and replicate it again and save some more logs, the following 
> log is the only thing I saved from the first occurrence;  It's the master 
> failing to start up while checking for the missing node:  
> http://pastebin.com/ZyQMQm91



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to