[ 
https://issues.apache.org/jira/browse/HBASE-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375010#comment-16375010
 ] 

stack commented on HBASE-20043:
-------------------------------

bq. Do the move region commands fail after we've already killed a server? That 
seems reasonable?

As it happens, this turned out to be something else. This is a nice test. Does 
random stuff to see what happens. There is a random move region Action that the 
Chaos Monkey runs as one of its optional operations. This 'Action' was stale, 
from hbase1 AM days. It was doing an unassign only (in pre-AMv2, if you did 
unassign, we'd close the region and then open it elsewhere... in hbase2, when 
you unassign a region.... ahem).  So, an unassign was happening, and then we'd 
do a move of a random region... but it had been unassigned... 'closed'... It 
was not deployed anywhere. So the assert was 'right'... We had a region with a 
null source (I fixed the random move action...).

Other interesting stuff in here is we were using a hadoop2 netty but it is 
changed in hadoop3 which was causing job to fail. Also at one point during this 
test run we had 1k threads up in the one JVM (we need to pare back how many we 
spin up by default....  have executors run w/ keepalive and die-off if no 
work...).

On the "h3 we take longer", because what happens in a run is random, it seems 
like high-variance in the run-rate. I've been running hadoop2 and then hadoop3s 
and they seem in same ballpark.... 5-11 minutes or so.

Doing a bit more study before putting up a patch.



> ITBLL fails against hadoop3
> ---------------------------
>
>                 Key: HBASE-20043
>                 URL: https://issues.apache.org/jira/browse/HBASE-20043
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>            Reporter: Mike Drob
>            Assignee: stack
>            Priority: Major
>             Fix For: 2.0.0-beta-2
>
>
> This has been failing for a while, I haven't tried to bisec but it was before 
> my changes for HBASE-19991 at least.
> {code}
> mvn clean verify -pl hbase-it -Dhadoop.profile=3.0 
> -Dit.test=IntegrationTestBigLinkedList -Dtest=none -am
> {code}
> {code}
> 2018-02-21 16:43:13,265 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=60450] 
> ipc.RpcServer(464): Unexpected throwable object 
> java.lang.AssertionError: 
> hri=IntegrationTestBigLinkedList,\x8E8\xE3\x8E8\xE3\x8E5,1519252895022.236bbedde32e4549691c108a1a7005a8.,
>  source=, destination=mdrob-mbp.hsd1.tx.comcast.net,60456,1519252856027
>       at org.apache.hadoop.hbase.master.HMaster.move(HMaster.java:1691)
>       at 
> org.apache.hadoop.hbase.master.MasterRpcServices.moveRegion(MasterRpcServices.java:1348)
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> 2018-02-21 16:43:13,276 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=60450] 
> ipc.CallRunner(141): callId: 49 service: MasterService methodName: MoveRegion 
> size: 106 connection: 192.168.1.134:60743 deadline: 1519253053263
> java.io.IOException: 
> hri=IntegrationTestBigLinkedList,\x8E8\xE3\x8E8\xE3\x8E5,1519252895022.236bbedde32e4549691c108a1a7005a8.,
>  source=, destination=mdrob-mbp.hsd1.tx.comcast.net,60456,1519252856027
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:465)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: java.lang.AssertionError: 
> hri=IntegrationTestBigLinkedList,\x8E8\xE3\x8E8\xE3\x8E5,1519252895022.236bbedde32e4549691c108a1a7005a8.,
>  source=, destination=mdrob-mbp.hsd1.tx.comcast.net,60456,1519252856027
>       at org.apache.hadoop.hbase.master.HMaster.move(HMaster.java:1691)
>       at 
> org.apache.hadoop.hbase.master.MasterRpcServices.moveRegion(MasterRpcServices.java:1348)
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406)
>       ... 3 more
> {code}
> The assertion that it trips is:
> {code}
>     // Now we can do the move
>     RegionPlan rp = new RegionPlan(hri, regionState.getServerName(), dest);
>     assert rp.getDestination() != null: rp.toString() + " " + dest;
>     assert rp.getSource() != null: rp.toString();
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to