[
https://issues.apache.org/jira/browse/HBASE-21078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589370#comment-16589370
]
stack commented on HBASE-21078:
-------------------------------
Not sure if this my patch but on cluster I see a strangeness where a MP starts
up its UP and then it hangs here and is never scheduled again:
{code}
2018-08-22 11:05:29,929 INFO
[RpcServer.default.FPBQ.Fifo.handler=3,queue=3,port=16000] master.HMaster:
Client=stack//10.17.240.20 move hri=e5ed5607d30e9813aa7206048d5a94fd,
source=ve0530.halxg.cloudera.com,16020,1534960226278,
destination=ve0540.halxg.cloudera.com,16020,1534960226357, running balancer
2018-08-22 11:05:30,144 INFO [PEWorker-1] procedure.MasterProcedureScheduler:
pid=288, state=RUNNABLE:MOVE_REGION_UNASSIGN, hasLock=false;
MoveRegionProcedure hri=e5ed5607d30e9813aa7206048d5a94fd,
source=ve0530.halxg.cloudera.com,16020,1534960226278,
destination=ve0540.halxg.cloudera.com,16020,1534960226357 checking lock on
e5ed5607d30e9813aa7206048d5a94fd
2018-08-22 11:05:30,195 INFO [PEWorker-1] procedure2.ProcedureExecutor:
Initialized subprocedures=[{pid=289, ppid=288,
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; UnassignProcedure
table=IntegrationTestBigLinkedList, region=e5ed5607d30e9813aa7206048d5a94fd,
server=ve0530.halxg.cloudera.com,16020,1534960226278}]
2018-08-22 11:05:30,252 INFO [PEWorker-1] procedure.MasterProcedureScheduler:
pid=289, ppid=288, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false;
UnassignProcedure table=IntegrationTestBigLinkedList,
region=e5ed5607d30e9813aa7206048d5a94fd,
server=ve0530.halxg.cloudera.com,16020,1534960226278 checking lock on
e5ed5607d30e9813aa7206048d5a94fd
{code}
It has the lock on the region and won't let go (from UI):
{code}
REGION: e5ed5607d30e9813aa7206048d5a94fd
Lock type: EXCLUSIVE
Owner procedure: { ID => '288', PARENT_ID => '-1', STATE => 'WAITING', OWNER =>
'stack', TYPE => 'MoveRegionProcedure hri=e5ed5607d30e9813aa7206048d5a94fd,
source=ve0530.halxg.cloudera.com,16020,1534960226278,
destination=ve0540.halxg.cloudera.com,16020,1534960226357', START_TIME => 'Wed
Aug 22 11:05:29 PDT 2018', LAST_UPDATE => 'Wed Aug 22 11:05:30 PDT 2018',
PARAMETERS => [ { state => [ '1', '2' ] }, { regionId => '1534960773088',
tableName => { namespace => 'ZGVmYXVsdA==', qualifier =>
'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdA==' }, startKey => 'BhjRNw==', endKey =>
'DDDDDDDDDDA=', offline => 'false', split => 'false', replicaId => '0' }, {
sourceServer => { hostName => 've0530.halxg.cloudera.com', port => '16020',
startCode => '1534960226278' }, destinationServer => { hostName =>
've0540.halxg.cloudera.com', port => '16020', startCode => '1534960226357' } }
] }
{code}
> [amv2] CODE-BUG NPE in RTP doing Unassign
> -----------------------------------------
>
> Key: HBASE-21078
> URL: https://issues.apache.org/jira/browse/HBASE-21078
> Project: HBase
> Issue Type: Bug
> Components: amv2
> Affects Versions: 2.0.1
> Reporter: stack
> Assignee: stack
> Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-21078.branch-2.0.001.patch,
> HBASE-21078.branch-2.0.002.patch, HBASE-21078.branch-2.0.003.patch
>
>
> Saw this is a run against tip of branch-2.0. The region had just finished
> being split when the move goes to run.
> {code}
> 2018-08-18 16:55:14,908 INFO [PEWorker-2] procedure2.ProcedureExecutor:
> Finished pid=2028, state=SUCCESS, hasLock=false; SplitTableRegionProcedure
> table=IntegrationTestBigLinkedList, parent=c3f199b5af62ae2ff8f8b6426b21d95d,
> daughterA=31ccbf098ae615ce30f28ec84c956b8f,
> daughterB=1890b4c96736f223f31efef11c817c90 in 9.0090sec
> 2018-08-18 16:55:14,908 INFO [PEWorker-16]
> procedure.MasterProcedureScheduler: pid=2038, ppid=2030,
> state=RUNNABLE:MOVE_REGION_UNASSIGN, hasLock=false; MoveRegionProcedure
> hri=c3f199b5af62ae2ff8f8b6426b21d95d,
> source=ve0540.halxg.cloudera.com,16020,1534632630737,
> destination=ve0540.halxg.cloudera.com,16020,1534632630737 checking lock on
> c3f199b5af62ae2ff8f8b6426b21d95d
> 2018-08-18 16:55:14,958 INFO [PEWorker-16] procedure2.ProcedureExecutor:
> Initialized subprocedures=[{pid=2095, ppid=2038,
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; UnassignProcedure
> table=IntegrationTestBigLinkedList, region=c3f199b5af62ae2ff8f8b6426b21d95d,
> server=ve0540.halxg.cloudera.com,16020,1534632630737}]
> 2018-08-18 16:55:15,008 INFO [PEWorker-3]
> procedure.MasterProcedureScheduler: pid=2095, ppid=2038,
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; UnassignProcedure
> table=IntegrationTestBigLinkedList, region=c3f199b5af62ae2ff8f8b6426b21d95d,
> server=ve0540.halxg.cloudera.com,16020,1534632630737 checking lock on
> c3f199b5af62ae2ff8f8b6426b21d95d
> 2018-08-18 16:55:15,085 ERROR [PEWorker-3] procedure2.ProcedureExecutor:
> CODE-BUG: Uncaught runtime exception: pid=2095, ppid=2038,
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedure
> table=IntegrationTestBigLinkedList, region=c3f199b5af62ae2ff8f8b6426b21d95d,
> server=ve0540.halxg.cloudera.com,16020,1534632630737
> java.lang.NullPointerException
> at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:1097)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:1125)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1477)
> at
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:204)
> at
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:345)
> at
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:873)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1556)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1344)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1854)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)