[
https://issues.apache.org/jira/browse/HBASE-28366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818310#comment-17818310
]
Viraj Jasani commented on HBASE-28366:
--------------------------------------
{quote}And for regionServerReport, we used to make use of the information to
fix inconsistency but it introduced bunch of new inconsistency so IIRC finally
we choose to only log the inconsistency, instead of fixing it in
regionServerReport, or at least we will not fix it immediately
{quote}
I see, that makes sense. As part of this Jira, we could still reject serving rs
report from old server, i.e. make that check more stringent by not only relying
on dead server map but also compare with online server map and if we find that
we have new online server with same host + port but with higher startcode, we
reject rs report from old server immediately. As of today, we accept it after
logging a warning, and that results into inconsistencies.
{quote}I think, there could be race that master think a region server is dead,
but the region server is still alive.
{quote}
That is correct for this case. As per the logs, at 16:50:33, master scheduled
SCP due to removal of ephemeral rs znode.
>From master side, the SCP was completed at 16:53:01:
{code:java}
2024-01-29 16:53:01,640 INFO [PEWorker-39] procedure2.ProcedureExecutor -
Finished pid=9812440, state=SUCCESS; ServerCrashProcedure
hbase2a-dnds1-114-ia6.ops.sfdc.net,61020,1706541866103, splitWal=true,
meta=false in 2 mins, 27.679 sec{code}
>From regionserver side, at 16:54:27, we saw the first occurrence of:
{code:java}
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss for /hbase/master{code}
{quote}But after SCP, the region server can not accept any write requests any
more, although it could still serve read requests.
{quote}
That's because of the WAL splitting done by the master, correct?
> Mis-order of SCP and regionServerReport results into region inconsistencies
> ---------------------------------------------------------------------------
>
> Key: HBASE-28366
> URL: https://issues.apache.org/jira/browse/HBASE-28366
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.4.17, 3.0.0-beta-1, 2.5.7
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
>
> If the regionserver is online but due to network issue, if it's rs ephemeral
> node gets deleted in zookeeper, active master schedules the SCP. However, if
> the regionserver is alive, it can still send regionServerReport to active
> master. In the case where SCP assigns regions to other regionserver that were
> previously hosted on the old regionserver (which is still alive), the old rs
> can continue to sent regionServerReport to active master.
> Eventually this results into region inconsistencies because region is alive
> on two regionservers at the same time (though it's temporary state because
> the rs will be aborted soon). While old regionserver can have zookeeper
> connectivity issues, it can still make rpc calls to active master.
> Logs:
> SCP:
> {code:java}
> 2024-01-29 16:50:33,956 INFO [RegionServerTracker-0]
> assignment.AssignmentManager - Scheduled ServerCrashProcedure pid=9812440 for
> server1-114.xyz,61020,1706541866103 (carryingMeta=false)
> server1-114.xyz,61020,1706541866103/CRASHED/regionCount=364/lock=java.util.concurrent.locks.ReentrantReadWriteLock@5d5fc31[Write
> locks = 1, Read locks = 0], oldState=ONLINE.
> 2024-01-29 16:50:33,956 DEBUG [RegionServerTracker-0]
> procedure2.ProcedureExecutor - Stored pid=9812440,
> state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure
> server1-114.xyz,61020,1706541866103, splitWal=true, meta=false
> 2024-01-29 16:50:33,973 INFO [PEWorker-36] procedure.ServerCrashProcedure -
> Splitting WALs pid=9812440, state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS,
> locked=true; ServerCrashProcedure server1-114.xyz,61020,1706541866103,
> splitWal=true, meta=false, isMeta: false
> {code}
> As part of SCP, d743ace9f70d55f55ba1ecc6dc49a5cb was assigned to another
> server:
>
> {code:java}
> 2024-01-29 16:50:42,656 INFO [PEWorker-24] procedure.MasterProcedureScheduler
> - Took xlock for pid=9818494, ppid=9812440,
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> TransitRegionStateProcedure
> table=PLATFORM_ENTITY.PLATFORM_IMMUTABLE_ENTITY_DATA,
> region=d743ace9f70d55f55ba1ecc6dc49a5cb, ASSIGN
> 2024-01-29 16:50:43,106 INFO [PEWorker-23] assignment.RegionStateStore -
> pid=9818494 updating hbase:meta row=d743ace9f70d55f55ba1ecc6dc49a5cb,
> regionState=OPEN, repBarrier=12867482, openSeqNum=12867482,
> regionLocation=server1-65.xyz,61020,1706165574050
> {code}
>
> rs abort, after ~5 min:
> {code:java}
> 2024-01-29 16:54:27,235 ERROR [regionserver/server1-114:61020]
> regionserver.HRegionServer - ***** ABORTING region server
> server1-114.xyz,61020,1706541866103: Unexpected exception handling getData
> *****
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
> = ConnectionLoss for /hbase/master
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1229)
> at
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:414)
> at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:403)
> at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:367)
> at
> org.apache.hadoop.hbase.zookeeper.ZKNodeTracker.getData(ZKNodeTracker.java:180)
> at
> org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:152)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionServerStatusStub(HRegionServer.java:2892)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1352)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1142)
> {code}
>
> Several region transition failure report logs:
> {code:java}
> 2024-01-29 16:55:13,029 INFO [_REGION-regionserver/server1-114:61020-0]
> regionserver.HRegionServer - Failed report transition server { host_name:
> "server1-114.xyz" port: 61020 start_code: 1706541866103 } transition {
> transition_code: CLOSED region_info { region_id: 1671555604277 table_name {
> namespace: "default" qualifier: "TABLE1" } start_key: "abc" end_key: "xyz"
> offline: false split: false replica_id: 0 } proc_id: -1 }; retry (#0)
> immediately.
> java.net.UnknownHostException: Call to address=master-server1.xyz:61000
> failed on local exception: java.net.UnknownHostException:
> master-server1.xyz:61000 could not be resolved
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:239)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:391)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:425)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:420)
> at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114)
> at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129)
> at
> org.apache.hadoop.hbase.ipc.NettyRpcConnection.lambda$sendRequest$4(NettyRpcConnection.java:365)
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
> at
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403)
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
> at
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.net.UnknownHostException: master-server1.xyz:61000 could not
> be resolved
> at
> org.apache.hadoop.hbase.ipc.RpcConnection.getRemoteInetAddress(RpcConnection.java:192)
> at
> org.apache.hadoop.hbase.ipc.NettyRpcConnection.connect(NettyRpcConnection.java:280)
> at
> org.apache.hadoop.hbase.ipc.NettyRpcConnection.access$900(NettyRpcConnection.java:79)
> at
> org.apache.hadoop.hbase.ipc.NettyRpcConnection$5.run(NettyRpcConnection.java:340)
> at
> org.apache.hadoop.hbase.ipc.HBaseRpcControllerImpl.notifyOnCancel(HBaseRpcControllerImpl.java:259)
> at
> org.apache.hadoop.hbase.ipc.NettyRpcConnection.sendRequest0(NettyRpcConnection.java:323)
> at
> org.apache.hadoop.hbase.ipc.NettyRpcConnection.lambda$sendRequest$4(NettyRpcConnection.java:363)
> ... 8 more
> {code}
>
> The server is later started after sometime:
> {code:java}
> Mon Jan 29 17:01:35 UTC 2024 Starting regionserver on server1-114.xyz
> core file size (blocks, -c) unlimited
> ...
> ... {code}
>
> In order to avoid region inconsistencies, we should clear regionserver ->
> region reported map from active master. We can do it in two ways:
> # Active: master should throw error while serving regionServerReport call if
> the server is already considered dead due to deletion of ephemeral znode.
> # Passive: run a chore that can compare the regionserver -> region map
> generated from the regionserver reports with the actual live servers based on
> the zk tracking, and remove any rs report that is no longer valid.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)