[ 
https://issues.apache.org/jira/browse/HBASE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462009#comment-16462009
 ] 

Xiaolin Ha commented on HBASE-19997:
------------------------------------

Hello [~stack]
When rolling update masters from 0.98 to 2.0 while regionservers are 0.98,  
master will encounter NPE as well as regionserver. Logs on regionserver is,
2018-05-03,11:46:47,996 ERROR [PriorityRpcServer.handler=0,queue=0,port=37900] 
org.apache.hadoop.ipc.RpcServer: Unexpected throwable object
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:4807)
        at 
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:21048)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2061)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:125)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:152)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:128)
        at java.lang.Thread.run(Thread.java:745)
2018-05-03,11:46:48,335 ERROR [PriorityRpcServer.handler=1,queue=0,port=37900] 
org.apache.hadoop.ipc.RpcServer: Unexpected throwable object
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:4807)
        at 
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:21048)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2061)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:125)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:152)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:128)
        at java.lang.Thread.run(Thread.java:745)
and logs on master is,
2018-05-03,11:16:34,930 INFO [PEWorker-3] 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Dispatch 
pid=2, ppid=1, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure 
table=hbase:meta, region=1588230740, 
target=c4-hadoop-tst-st28.bj,37900,1525316678267; rit=OPENING, 
location=c4-hadoop-tst-st28.bj,37900,1525316678267
2018-05-03,11:16:35,081 INFO [ProcedureDispatcherTimeoutThread] 
org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher: Fallback to 
compat rpc execution for serverName=c4-hadoop-tst-st28.bj,37900,1525316678267 
version=401419
2018-05-03,11:16:35,193 WARN [RSProcedureDispatcher-pool3-t2] 
org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher: Failed dispatch 
to server=c4-hadoop-tst-st28.bj,37900,1525316678267 try=0
java.io.IOException: java.io.IOException
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2094)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:125)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:152)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:128)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:4807)
        at 
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:21048)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2061)
        ... 4 more
@c4-hadoop-tst-st28.bj/10.132.2.27:37900
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:100)
        at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:90)
        at 
org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:358)
        at 
org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:335)
        at 
org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher$OpenRegionRemoteCall.sendRequest(RSProcedureDispatcher.java:392)
        at 
org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher$OpenRegionRemoteCall.call(RSProcedureDispatcher.java:374)
        at 
org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher$OpenRegionRemoteCall.call(RSProcedureDispatcher.java:359)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
java.io.IOException
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2094)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:125)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:152)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:128)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:4807)
        at 
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:21048)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2061)
        ... 4 more
@c4-hadoop-tst-st28.bj/10.132.2.27:37900
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:387)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
        at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
        at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)

But afterwards, master will kill all regionservers like this,
2018-05-03,11:16:39,047 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=255,queue=15,port=37900] 
org.apache.hadoop.hbase.master.MasterRpcServices: Region server 
c4-hadoop-tst-st27.bj,37900,1525316676566 reported a fatal error:
ABORTING region server c4-hadoop-tst-st27.bj,37900,1525316676566: 
org.apache.hadoop.hbase.YouAreDeadException: Not online: 
_canary_,33333332,1525317288199.264d6421d08a6744d37be2f257354ca9.

and then kill masters themselves.
After this, we restart the hbase cluster, whose master is 2.0 and regionservers 
are 0.98, the cluster will serves OK!
 !Screenshot from 2018-05-03 14-43-46.png! 

I am confused about this.




> [rolling upgrade] 1.x => 2.x
> ----------------------------
>
>                 Key: HBASE-19997
>                 URL: https://issues.apache.org/jira/browse/HBASE-19997
>             Project: HBase
>          Issue Type: Umbrella
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 2.1.0
>
>         Attachments: Screenshot from 2018-05-03 14-43-46.png
>
>
> An umbrella issue of issues needed so folks can do a rolling upgrade from 
> hbase-1.x to hbase-2.x.
> (Recent) Notables:
>  * hbase-1.x can't read hbase-2.x WALs -- hbase-1.x doesn't know the 
> AsyncProtobufLogWriter class used writing the WAL -- see 
> https://issues.apache.org/jira/browse/HBASE-19166?focusedCommentId=16362897&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16362897
>  for exception.
>  ** Might be ok... means WAL split fails on an hbase1 RS... must wait till an 
> hbase-2.x RS picks up the WAL for it to be split.
>  * hbase-1 can't open regions from tables created by hbase-2; it can't find 
> the Table descriptor. See 
> https://issues.apache.org/jira/browse/HBASE-19116?focusedCommentId=16363276&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16363276
>  ** This might be ok if the tables we are doing rolling upgrade over were 
> written with hbase-1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to