[ 
https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854445#comment-13854445
 ] 

Sergey Shelukhin edited comment on HBASE-10210 at 12/20/13 7:15 PM:
--------------------------------------------------------------------

You mean the online servers in the tracker? It does add them to its internal 
list. Can you elaborate a bit.
If they are put into other online servers, wouldn't it make the issue worse - 
as far as I see in the check...AndAdd method and around ,there's no provision 
for one server to be added twice, if it was already there the same issue will 
happen, it will expire the "old" one (from ZK), then get report rejected.


was (Author: sershe):
You mean the online servers in the tracker? It does add them to its internal 
list. Can you elaborate a bit.
If they are put into other online servers, wouldn't it make the issue worse - 
as far as I see in the check...AndAdd method and around ,there's no provision 
for one server to be added twice, if it was already there the same issue will 
happen, report rejected.

> during master startup, RS can be you-are-dead-ed by master in error
> -------------------------------------------------------------------
>
>                 Key: HBASE-10210
>                 URL: https://issues.apache.org/jira/browse/HBASE-10210
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HBASE-10210.patch
>
>
> Not sure of the root cause yet, I am at "how did this ever work" stage.
> We see this problem in 0.96.1, but didn't in 0.96.0 + some patches.
> It looks like RS information arriving from 2 sources - ZK and server itself, 
> can conflict. Master doesn't handle such cases (timestamp match), and anyway 
> technically timestamps can collide for two separate servers.
> So, master YouAreDead-s the already-recorded reporting RS, and adds it too. 
> Then it discovers that the new server has died with fatal error!
> Note the threads.
> Addition is called from master initialization and from RPC.
> {noformat}
> 2013-12-19 11:16:45,290 INFO  
> [master:h2-ubuntu12-sec-1387431063-hbase-10:60000] master.ServerManager: 
> Finished waiting for region servers count to settle; checked in 2, slept for 
> 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running.
> 2013-12-19 11:16:45,290 INFO  
> [master:h2-ubuntu12-sec-1387431063-hbase-10:60000] master.ServerManager: 
> Registering 
> server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
> 2013-12-19 11:16:45,290 INFO  
> [master:h2-ubuntu12-sec-1387431063-hbase-10:60000] master.HMaster: Registered 
> server found up in zk but who has not yet reported in: 
> h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
> 2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=60000] 
> master.ServerManager: Triggering server recovery; existingServer 
> h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 
> looks stale, new 
> server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
> 2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=60000] 
> master.ServerManager: Master doesn't enable ServerShutdownHandler during 
> initialization, delay expiring server 
> h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
> ...
> 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=60000] 
> master.HMaster: Region server 
> h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 
> reported a fatal error:
> ABORTING region server 
> h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: 
> org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; 
> currently processing 
> h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as 
> dead server
> {noformat}
> Presumably some of the recent ZK listener related changes b



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to