[
https://issues.apache.org/jira/browse/HBASE-20131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393880#comment-16393880
]
stack commented on HBASE-20131:
-------------------------------
Its hard to compare branch-1 and branch-2. Also, general idea was that
Procedure would be contained in that all logic would be inside the Procedure
itself. The MoveProcedure has no prepare step as most others do where they
check that they are ok to run (see for example the AssignProcedure#start...).
This seems like oversight.
We should add a prepare to the MoveProcedure where it checks if it can run. If
table is offline or master is stopped or cluster is going down, we should not
run. Probably good if this method was exposed so could be called just inside
the RsRPCServer#moveRegion method.... It could call it to do basic checks early
so we can fail early and send failure directly back to user.
I was going to do a review of all our procedures to make sure a prep. Mind if I
add to your patch [~elserj]? Thanks sir.
> NPE in MoveRegionProcedure via IntegrationTestLoadAndVerify with CM
> -------------------------------------------------------------------
>
> Key: HBASE-20131
> URL: https://issues.apache.org/jira/browse/HBASE-20131
> Project: HBase
> Issue Type: Bug
> Components: proc-v2
> Reporter: Josh Elser
> Assignee: Josh Elser
> Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-20131.001.patch
>
>
> I believe the error is that a MoveRegionProcedure comes in via ChaosMonkey
> for an unassigned region that was from a disabled table (also due to CM)
> which causes an NPE as we try to set a null original location into the
> protobuf which fails.
> {noformat}
> 2018-03-02 23:07:00,146 ERROR
> [RpcServer.default.FPBQ.Fifo.handler=23,queue=2,port=20000] ipc.RpcServer:
> Unexpected throwable object
> java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos$MoveRegionStateData$Builder.setSourceServer(MasterProcedureProtos.java:26127)
> at
> org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.serializeStateData(MoveRegionProcedure.java:133)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProtoProcedure(ProcedureUtil.java:198)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.writeEntry(ProcedureWALFormat.java:211)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.writeInsert(ProcedureWALFormat.java:222)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.insert(WALProcedureStore.java:490)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.submitProcedure(ProcedureExecutor.java:863)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.submitProcedure(ProcedureExecutor.java:832)
> at
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitProcedure(ProcedureSyncWait.java:111)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.moveAsync(AssignmentManager.java:561)
> at org.apache.hadoop.hbase.master.HMaster.move(HMaster.java:1707)
> at
> org.apache.hadoop.hbase.master.MasterRpcServices.moveRegion(MasterRpcServices.java:1324)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304){noformat}
> IntegrationTestLoadAndVerify also failed, but I'm not sure if it's related to
> this, or just a problem with the test. The test failed because the table was
> left offline after it was disabled, and appears to not have been re-enabled.
> Still debugging that side..
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)