[
https://issues.apache.org/jira/browse/HBASE-22626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872941#comment-16872941
]
stack commented on HBASE-22626:
-------------------------------
Is this hbase-2.0.0 as affected version says? If so, a bunch of fixes went in
on branch-2.0 after 2.0.0 release. Was the data migrated from a branch-1 to
2.0.0? Thanks
> Master assigns the region successfully, but updates the state of region
> failed, and then keeping the state of the region is OPENNING in zookeeper,
> If master restarted, those OPENNING regions will not be assign forever.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-22626
> URL: https://issues.apache.org/jira/browse/HBASE-22626
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Affects Versions: 2.0.0
> Reporter: chenwandong
> Priority: Critical
>
> Problem Description
> (1)One of the region server restarts causes all the regions that have been
> assigned to be migrated.
> (2)The master checks these regions and assigns them to other region servers,
> and assigns other region server assignments successfully, but the update
> state fails.
> 2019-06-22 16:44:20,065 INFO [PEWorker-8] procedure2.ProcedureExecutor:
> Finished pid=5038, ppid=4488, state=SUCCESS; AssignProcedure
> table=test061910, region=379c730490b4848f3db065fb25b87452 in 10.5680sec
> ... ...
> 2019-06-22 16:44:38,725 INFO [PEWorker-4]
> procedure.MasterProcedureScheduler: pid=5038, ppid=4488, state=SUCCESS;
> AssignProcedure table=test061910, region=379c730490b4848f3db065fb25b87452
> checking lock on 379c730490b4848f3db065fb25b87452
> 2019-06-22 16:44:38,725 ERROR [PEWorker-4] procedure2.ProcedureExecutor:
> CODE-BUG: Uncaught runtime exception for pid=5038, ppid=4488, state=SUCCESS;
> AssignProcedure table=test061910, region=379c730490b4848f3db065fb25b87452
> java.lang.UnsupportedOperationException: Unhandled state
> REGION_TRANSITION_FINISH; there is no rollback for assignment unless we
> cancel the operation by dropping/disabling the table
> at
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:412)
> at
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:95)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1373)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1329)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1198)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1761)
> (3)The master restarted, and the old region state will be loaded after
> restarting. The region for the OPENNING state is not reassigned forever, and
> the following log is printed.
> 2019-06-22 16:46:26,210 INFO [master/hdp0:16000] assignment.RegionStateStore:
> Load hbase:meta entry region=379c730490b4848f3db065fb25b87452,
> regionState=OPENING, lastHost=hdp2,16020,1561190163476,
> regionLocation=hdp0,16020,1561190163887, openSeqNum=69448
> ... ...
> 2019-06-22 16:51:28,514 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition rit=OPENING, location=hdp0,16020,1561190163887,
> table=test061910, region=379c730490b4848f3db065fb25b87452
> ... ....
> 2019-06-22 16:49:28,483 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition rit=OPENING, location=hdp0,16020,1561190163887,
> table=test061910, region=379c730490b4848f3db065fb25b87452
> (4) The state in zookeeper
>
> test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452.
> column=info:regioninfo, timestamp=1561192609882, value=\{ENCODED =>
> 379c730490b4848f3db065fb25b87452, NAME =>
> 'test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452.',
> STARTKEY => '00000000000000000031457280', ENDKEY =>
> '00000000000000000041943040'}
>
> test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452.
> column=info:seqnumDuringOpen, timestamp=1561190288613,
> value=\x00\x00\x00\x00\x00\x01\x0FH
>
> test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452.
> column=info:server, timestamp=1561190288613, value=hdp2:16020
>
> test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452.
> column=info:serverstartcode, timestamp=1561190288613, value=1561190163476
>
> test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452.
> column=info:sn, timestamp=1561192609882, value=hdp0,16020,1561190163887
>
> test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452.
> column=info:state, timestamp=1561192609882, value=OPENING
>
> Problem Recurrence
> (1) Modify the state of a normal region from OPEN to OPENNING.
> (2) Restart the master and region servers, view the master log and hbase web.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)