[
https://issues.apache.org/jira/browse/HBASE-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090844#comment-13090844
]
Sudharsan Sampath commented on HBASE-3331:
------------------------------------------
I am facing this issue in 0.90.1 version. I have two servers in my test
environment with one server hosting both master and regionserver and the other
only regionserver. HBase manages the ZK. The quorum contains both these
servers. Both the ROOT and the META regions are on one of my region server. If
that regionserver is stopped/killed the master web page does not come up and
throws Connection Refused on attempting to conatct the region server. The
master server logs seems to be more related to the ROOT region though. Should I
open a new issue?
2011-08-25 12:50:23,531 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
locateRegionInMeta parentTable=-ROOT-, metaLocation=address: <<server>>:60020,
regioninfo: -ROOT-,,0.70236052, attempt=8 of 10 failed; retrying after sleep of
16000 because: Connection refused
2011-08-25 12:50:23,531 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Lookedup root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@62135133;
hsa=<<server>>:60020
2011-08-25 12:50:39,531 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Lookedup root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@62135133;
hsa=<<server>>:60020
2011-08-25 12:50:39,532 WARN org.mortbay.log: /master.jsp:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server null for region , row '', but failed after 10 attempts.
Exceptions:
java.net.ConnectException: Connection refused
> Kill -STOP of RS hosting META does not recover
> ----------------------------------------------
>
> Key: HBASE-3331
> URL: https://issues.apache.org/jira/browse/HBASE-3331
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Priority: Critical
> Fix For: 0.92.0
>
> Attachments: timeouts.log.txt
>
>
> If you find the server hosting META and kill -STOP its region server, it will
> eventually lose its ZK session and the master will split its logs and try to
> reassign. However, at some point along here it tries to access the old META,
> and gets SocketTimeoutExceptions, which cause it to keep retrying forever.
> Once I kill -9ed the stopped server, things came back to life.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira