[ 
https://issues.apache.org/jira/browse/HBASE-20864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541451#comment-16541451
 ] 

Duo Zhang commented on HBASE-20864:
-----------------------------------

I think the problem is here
{noformat}
2018-07-09 20:03:38,716 INFO  [PEWorker-9] assignment.RegionStateStore: 
pid=2308 updating hbase:meta row=7a5b2c7b4b1edaba7f90b45f3e536293, 
regionState=OPENING
2018-07-09 20:03:38,716 INFO  [PEWorker-8] assignment.RegionStateStore: 
pid=2309 updating hbase:meta row=7e9317c9b32e95b2e6482ef4a7145078, 
regionState=OPENING, regionLocation=e010125049164.bja,60020,1531136465378
2018-07-09 20:03:38,716 INFO  [PEWorker-3] assignment.RegionStateStore: 
pid=2305 updating hbase:meta row=4423e4182457c5b573729be4682cc3a3, 
regionState=OPENING
2018-07-09 20:03:38,716 INFO  [PEWorker-15] assignment.RegionStateStore: 
pid=2306 updating hbase:meta row=fc5a65649a2462683a380f9f833151c3, 
regionState=OPENING, regionLocation=e010125048016.bja,60020,1531137190779
2018-07-09 20:03:38,716 INFO  [PEWorker-16] assignment.RegionStateStore: 
pid=2307 updating hbase:meta row=30d22d10f12cee0ed3603a447ee710e2, 
regionState=OPENING, regionLocation=e010125048016.bja,60020,1531137190779
2018-07-09 20:03:38,716 INFO  [PEWorker-1] assignment.RegionStateStore: 
pid=2304 updating hbase:meta row=58cd377b1c46faf98c3a5ee61b4c97fa, 
regionState=OPENING
{noformat}

You can see that, for 4423e4182457c5b573729be4682cc3a3, there is no 
regionLocation information. Actually, there are two fields in meta table which 
record the location of the region, and OPENING and OPEN will write different 
field. And when loading meta, we will use the location written when OPENING.

Need to dig more why here we do not have a regionLocation, it should not happen.

> RS was killed due to master thought the region should be on a already dead 
> server
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-20864
>                 URL: https://issues.apache.org/jira/browse/HBASE-20864
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: Allan Yang
>            Priority: Major
>         Attachments: log.zip
>
>
> When I was running ITBLL with our internal 2.0.0 version(with 2.0.1 
> backported and with other two issues: HBASE-20706, HBASE-20752). I found two 
> of my RS killed by master since master has a different region state with 
> those RS. It is very strange that master thought these region should be on a 
> already dead server. There might be a serious bug, but I haven't found it 
> yet. Here is the process:
> 1. e010125048153.bja,60020,1531137365840 is crashed, and clearly 
> 4423e4182457c5b573729be4682cc3a3 was assigned to 
> e010125049164.bja,60020,1531136465378 during ServerCrashProcedure
> {code:java}
> 2018-07-09 20:03:32,443 INFO  [PEWorker-10] procedure.ServerCrashProcedure: 
> Start pid=2303, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
> server=e010125048153.bja,60020,1531137365840, splitWal=true, meta=false
> 2018-07-09 20:03:39,220 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=294,queue=24,port=60000] 
> assignment.RegionTransitionProcedure: Received report OPENED seqId=16021, 
> pid=2305, ppid=2303, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
> AssignProcedure table=IntegrationTestBigLinkedList, 
> region=4423e4182457c5b573729be4682cc3a3; rit=OPENING, 
> location=e010125049164.bja,60020,1531136465378
> 2018-07-09 20:03:39,220 INFO  [PEWorker-13] assignment.RegionStateStore: 
> pid=2305 updating hbase:meta row=4423e4182457c5b573729be4682cc3a3, 
> regionState=OPEN, openSeqNum=16021, 
> regionLocation=e010125049164.bja,60020,1531136465378
> 2018-07-09 20:03:43,190 INFO  [PEWorker-12] procedure2.ProcedureExecutor: 
> Finished pid=2303, state=SUCCESS; ServerCrashProcedure 
> server=e010125048153.bja,60020,1531137365840, splitWal=true, meta=false in 
> 10.7490sec
> {code}
> 2. A modify table happened later, the 4423e4182457c5b573729be4682cc3a3 was 
> reopend on e010125049164.bja,60020,1531136465378
> {code:java}
> 2018-07-09 20:04:39,929 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=295,queue=25,port=60000] 
> assignment.RegionTransitionProcedure: Received report OPENED seqId=16024, 
> pid=2351, ppid=2314, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
> AssignProcedure table=IntegrationTestBigLinkedList, 
> region=4423e4182457c5b573729be4682cc3a3, 
> target=e010125049164.bja,60020,1531136465378; rit=OPENING, 
> location=e010125049164.bja,60020,1531136465378
> 2018-07-09 20:04:40,554 INFO  [PEWorker-6] assignment.RegionStateStore: 
> pid=2351 updating hbase:meta row=4423e4182457c5b573729be4682cc3a3, 
> regionState=OPEN, openSeqNum=16024, 
> regionLocation=e010125049164.bja,60020,1531136465378
> {code}
> 3. Active master was killed, the backup master took over, but when loading 
> meta entry, it clearly showed 4423e4182457c5b573729be4682cc3a3 is on the 
> privous dead server e010125048153.bja,60020,1531137365840. That is very very 
> strange!!!
> {code:java}
> 2018-07-09 20:06:17,985 INFO  [master/e010125048016:60000] 
> assignment.RegionStateStore: Load hbase:meta entry 
> region=4423e4182457c5b573729be4682cc3a3, regionState=OPEN, 
> lastHost=e010125049164.bja,60020,1531136465378, 
> regionLocation=e010125048153.bja,60020,1531137365840, openSeqNum=16024
> {code}
> 4. the rs was killed
> {code:java}
> 2018-07-09 20:06:20,265 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=297,queue=27,port=60000] 
> assignment.AssignmentManager: Killing e010125049164.bja,60020,1531136465378: 
> rit=OPEN, location=e010125048153.bja,60020,1531137365840, 
> table=IntegrationTestBigLinkedList, 
> region=4423e4182457c5b573729be4682cc3a3reported OPEN on 
> server=e010125049164.bja,60020,1531136465378 but state has otherwise.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to