[
https://issues.apache.org/jira/browse/HBASE-22236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821930#comment-16821930
]
Duo Zhang commented on HBASE-22236:
-----------------------------------
OK the problem is
{code}
static boolean canUpdateOnError(HRegionLocation loc, HRegionLocation oldLoc) {
// Do not need to update if no such location, or the location is newer, or
the location is not
// the same with us
return oldLoc != null && oldLoc.getSeqNum() <= loc.getSeqNum() &&
oldLoc.getServerName().equals(loc.getServerName());
}
{code}
The oldLoc.getServerName() returns null so we get a NPE. This is the log which
tells us that the oldLoc.getServerName is null.
{noformat}
2019-04-18 16:54:05,605 DEBUG [Default-IPC-NioEventLoopGroup-8-5]
client.AsyncRegionLocatorHelper(59): Try updating
region=async,111,1555606423724.4b28e02c280866c0ac63dc1f20e9c274.,
hostname=asf904.gq1.ygridcore.net,34751,1555606417384, seqNum=9 , the old value
is region=async,111,1555606444785.9f87a8c0763028897001a6b574f9bcd5.,
hostname=null, seqNum=1,
error=org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException:
async,111,1555606423724.4b28e02c280866c0ac63dc1f20e9c274. is not online on
asf904.gq1.ygridcore.net,34751,1555606417384
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3363)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3340)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1441)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2523)
at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
{noformat}
It can be fixed by adding a null check. But first I want to check why we can
cache an HRegionLocation with a null location...
> TestAsyncTableGetMultiThreaded sometimes timed out
> --------------------------------------------------
>
> Key: HBASE-22236
> URL: https://issues.apache.org/jira/browse/HBASE-22236
> Project: HBase
> Issue Type: Bug
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Major
> Attachments: HBASE-22236.patch
>
>
> https://builds.apache.org/job/HBase-Flaky-Tests/job/master/2992/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt/*view*/
> After this line
> {noformat}
> 2019-04-14 04:44:41,736 INFO [PEWorker-12]
> procedure2.ProcedureExecutor(1410): Finished pid=117, state=SUCCESS,
> hasLock=false; TransitRegionStateProcedure table=hbase:meta,
> region=1588230740, REOPEN/MOVE in 2.0690sec
> {noformat}
> Seems we just do nothing until the test is timed out.
> And there is no main thread in the output hanging thread, which is a bit
> strange, although all the get threads are hanging there.
> Let me add some logs for better debugging first.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)