[
https://issues.apache.org/jira/browse/HBASE-22041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
lujie updated HBASE-22041:
--------------------------
Description:
while master fresh boot, we crash (kill- 9) the RS who hold meta. we find that
the master startup fails and print thounds of logs like:
{code:java}
2019-03-13 01:09:54,896 WARN [RSProcedureDispatcher-pool4-t1]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to java.net.ConnectException: Call to hadoop14/172.16.1.131:16020
failed on connection exception:
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
syscall:getsockopt(..) failed: Connection refused:
hadoop14/172.16.1.131:16020, try=0, retrying...
2019-03-13 01:09:55,004 WARN [RSProcedureDispatcher-pool4-t2]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=1, retrying...
2019-03-13 01:09:55,114 WARN [RSProcedureDispatcher-pool4-t3]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=2, retrying...
2019-03-13 01:09:55,219 WARN [RSProcedureDispatcher-pool4-t4]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=3, retrying...
2019-03-13 01:09:55,324 WARN [RSProcedureDispatcher-pool4-t5]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=4, retrying...
2019-03-13 01:09:55,428 WARN [RSProcedureDispatcher-pool4-t6]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=5, retrying...
2019-03-13 01:09:55,533 WARN [RSProcedureDispatcher-pool4-t7]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=6, retrying...
2019-03-13 01:09:55,638 WARN [RSProcedureDispatcher-pool4-t8]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=7, retrying...
2019-03-13 01:09:55,755 WARN [RSProcedureDispatcher-pool4-t9]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=8, retrying...
{code}
was:
while master fresh boot, we shutdown the RS who hold meta. we find that the
master startup fails and print thounds of logs like:
{code:java}
2019-03-13 01:09:54,896 WARN [RSProcedureDispatcher-pool4-t1]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to java.net.ConnectException: Call to hadoop14/172.16.1.131:16020
failed on connection exception:
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
syscall:getsockopt(..) failed: Connection refused:
hadoop14/172.16.1.131:16020, try=0, retrying...
2019-03-13 01:09:55,004 WARN [RSProcedureDispatcher-pool4-t2]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=1, retrying...
2019-03-13 01:09:55,114 WARN [RSProcedureDispatcher-pool4-t3]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=2, retrying...
2019-03-13 01:09:55,219 WARN [RSProcedureDispatcher-pool4-t4]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=3, retrying...
2019-03-13 01:09:55,324 WARN [RSProcedureDispatcher-pool4-t5]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=4, retrying...
2019-03-13 01:09:55,428 WARN [RSProcedureDispatcher-pool4-t6]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=5, retrying...
2019-03-13 01:09:55,533 WARN [RSProcedureDispatcher-pool4-t7]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=6, retrying...
2019-03-13 01:09:55,638 WARN [RSProcedureDispatcher-pool4-t8]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=7, retrying...
2019-03-13 01:09:55,755 WARN [RSProcedureDispatcher-pool4-t9]
procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724
failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to
hadoop14/172.16.1.131:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: hadoop14/172.16.1.131:16020, try=8, retrying...
{code}
> Master stuck in startup and print "FailedServerException" forever
> -----------------------------------------------------------------
>
> Key: HBASE-22041
> URL: https://issues.apache.org/jira/browse/HBASE-22041
> Project: HBase
> Issue Type: Bug
> Reporter: lujie
> Priority: Critical
> Attachments: fixedlogs.zip
>
>
> while master fresh boot, we crash (kill- 9) the RS who hold meta. we find
> that the master startup fails and print thounds of logs like:
> {code:java}
> 2019-03-13 01:09:54,896 WARN [RSProcedureDispatcher-pool4-t1]
> procedure.RSProcedureDispatcher: request to server
> hadoop14,16020,1552410583724 failed due to java.net.ConnectException: Call to
> hadoop14/172.16.1.131:16020 failed on connection exception:
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
> syscall:getsockopt(..) failed: Connection refused:
> hadoop14/172.16.1.131:16020, try=0, retrying...
> 2019-03-13 01:09:55,004 WARN [RSProcedureDispatcher-pool4-t2]
> procedure.RSProcedureDispatcher: request to server
> hadoop14,16020,1552410583724 failed due to
> org.apache.hadoop.hbase.ipc.FailedServerException: Call to
> hadoop14/172.16.1.131:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the
> failed servers list: hadoop14/172.16.1.131:16020, try=1, retrying...
> 2019-03-13 01:09:55,114 WARN [RSProcedureDispatcher-pool4-t3]
> procedure.RSProcedureDispatcher: request to server
> hadoop14,16020,1552410583724 failed due to
> org.apache.hadoop.hbase.ipc.FailedServerException: Call to
> hadoop14/172.16.1.131:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the
> failed servers list: hadoop14/172.16.1.131:16020, try=2, retrying...
> 2019-03-13 01:09:55,219 WARN [RSProcedureDispatcher-pool4-t4]
> procedure.RSProcedureDispatcher: request to server
> hadoop14,16020,1552410583724 failed due to
> org.apache.hadoop.hbase.ipc.FailedServerException: Call to
> hadoop14/172.16.1.131:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the
> failed servers list: hadoop14/172.16.1.131:16020, try=3, retrying...
> 2019-03-13 01:09:55,324 WARN [RSProcedureDispatcher-pool4-t5]
> procedure.RSProcedureDispatcher: request to server
> hadoop14,16020,1552410583724 failed due to
> org.apache.hadoop.hbase.ipc.FailedServerException: Call to
> hadoop14/172.16.1.131:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the
> failed servers list: hadoop14/172.16.1.131:16020, try=4, retrying...
> 2019-03-13 01:09:55,428 WARN [RSProcedureDispatcher-pool4-t6]
> procedure.RSProcedureDispatcher: request to server
> hadoop14,16020,1552410583724 failed due to
> org.apache.hadoop.hbase.ipc.FailedServerException: Call to
> hadoop14/172.16.1.131:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the
> failed servers list: hadoop14/172.16.1.131:16020, try=5, retrying...
> 2019-03-13 01:09:55,533 WARN [RSProcedureDispatcher-pool4-t7]
> procedure.RSProcedureDispatcher: request to server
> hadoop14,16020,1552410583724 failed due to
> org.apache.hadoop.hbase.ipc.FailedServerException: Call to
> hadoop14/172.16.1.131:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the
> failed servers list: hadoop14/172.16.1.131:16020, try=6, retrying...
> 2019-03-13 01:09:55,638 WARN [RSProcedureDispatcher-pool4-t8]
> procedure.RSProcedureDispatcher: request to server
> hadoop14,16020,1552410583724 failed due to
> org.apache.hadoop.hbase.ipc.FailedServerException: Call to
> hadoop14/172.16.1.131:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the
> failed servers list: hadoop14/172.16.1.131:16020, try=7, retrying...
> 2019-03-13 01:09:55,755 WARN [RSProcedureDispatcher-pool4-t9]
> procedure.RSProcedureDispatcher: request to server
> hadoop14,16020,1552410583724 failed due to
> org.apache.hadoop.hbase.ipc.FailedServerException: Call to
> hadoop14/172.16.1.131:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the
> failed servers list: hadoop14/172.16.1.131:16020, try=8, retrying...
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)