[ 
https://issues.apache.org/jira/browse/HBASE-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090844#comment-13090844
 ] 

Sudharsan Sampath commented on HBASE-3331:
------------------------------------------

I am facing this issue in 0.90.1 version. I have two servers in my test 
environment with one server hosting both master and regionserver and the other 
only regionserver. HBase manages the ZK. The quorum contains both these 
servers. Both the ROOT and the META regions are on one of my region server. If 
that regionserver is stopped/killed the master web page does not come up and 
throws Connection Refused on attempting to conatct the region server. The 
master server logs seems to be more related to the ROOT region though. Should I 
open a new issue?



2011-08-25 12:50:23,531 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
locateRegionInMeta parentTable=-ROOT-, metaLocation=address: <<server>>:60020, 
regioninfo: -ROOT-,,0.70236052, attempt=8 of 10 failed; retrying after sleep of 
16000 because: Connection refused
2011-08-25 12:50:23,531 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
Lookedup root region location, 
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@62135133;
 hsa=<<server>>:60020
2011-08-25 12:50:39,531 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
Lookedup root region location, 
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@62135133;
 hsa=<<server>>:60020
2011-08-25 12:50:39,532 WARN org.mortbay.log: /master.jsp: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact 
region server null for region , row '', but failed after 10 attempts.
Exceptions:
java.net.ConnectException: Connection refused


> Kill -STOP of RS hosting META does not recover
> ----------------------------------------------
>
>                 Key: HBASE-3331
>                 URL: https://issues.apache.org/jira/browse/HBASE-3331
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: timeouts.log.txt
>
>
> If you find the server hosting META and kill -STOP its region server, it will 
> eventually lose its ZK session and the master will split its logs and try to 
> reassign. However, at some point along here it tries to access the old META, 
> and gets SocketTimeoutExceptions, which cause it to keep retrying forever. 
> Once I kill -9ed the stopped server, things came back to life.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to