[
https://issues.apache.org/jira/browse/HBASE-27230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
chenglei updated HBASE-27230:
-----------------------------
Description:
As HBASE-27223 said, if {{WAL.sync}} get a timeout exception, we should abort
the region server, as the design of WAL sync, is to succeed or die, there is no
'failure'. It is usually not a big deal is because we set a very large default
value(5 minutes) for {{AbstractFSWAL.WAL_SYNC_TIMEOUT_MS}}, usually the WAL
system will abort the region server if it can not finish the sync within 5
minutes.
In the PR, only the {{WAL.sync}} timeout in {{HRegion#doWALAppend}}
,regionServer is always aborted. For {{WALUtil.writeMarker}}, it is just record
the internal state and seems it is no need to always abort the regionServer
when {{WAL.sync}} timeout,it is the internal state transition that determines
whether regionServer is aborted.
was:
As HBASE-27223 said, if {{WAL.sync}} get a timeout exception, we should abort
the region server, as the design of WAL sync, is to succeed or die, there is no
'failure'. It is usually not a big deal is because we set a very large default
value(5 minutes) for {{AbstractFSWAL.WAL_SYNC_TIMEOUT_MS}}, usually the WAL
system will abort the region server if it can not finish the sync within 5
minutes.
In the PR, only the {{WAL.sync}} timeout in {{HRegion#doWALAppend}}
,RegionServer is always aborted. For {{WALUtil.writeMarker}}, it is just record
the internal state and seems it is no need to always abort the regionServer
when {{WAL.sync}} timeout,it is the internal state transition that determines
whether regionServer Abort is made.
> RegionServer should be aborted when WAL.sync throws TimeoutIOException
> ----------------------------------------------------------------------
>
> Key: HBASE-27230
> URL: https://issues.apache.org/jira/browse/HBASE-27230
> Project: HBase
> Issue Type: Bug
> Components: wal
> Affects Versions: 3.0.0-alpha-4
> Reporter: chenglei
> Assignee: chenglei
> Priority: Major
>
> As HBASE-27223 said, if {{WAL.sync}} get a timeout exception, we should
> abort the region server, as the design of WAL sync, is to succeed or die,
> there is no 'failure'. It is usually not a big deal is because we set a very
> large default value(5 minutes) for {{AbstractFSWAL.WAL_SYNC_TIMEOUT_MS}},
> usually the WAL system will abort the region server if it can not finish the
> sync within 5 minutes.
> In the PR, only the {{WAL.sync}} timeout in {{HRegion#doWALAppend}}
> ,regionServer is always aborted. For {{WALUtil.writeMarker}}, it is just
> record the internal state and seems it is no need to always abort the
> regionServer when {{WAL.sync}} timeout,it is the internal state transition
> that determines whether regionServer is aborted.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)