[
https://issues.apache.org/jira/browse/HBASE-27230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
chenglei updated HBASE-27230:
-----------------------------
Description:
As HBASE-27223 said, if {{WAL.sync}} get a timeout exception, the only correct
way is to abort the region server, as the design of WAL sync, is to succeed or
die, there is no 'failure'. It is usually not a big deal is because we set a
very large default value(5 minutes) for {{AbstractFSWAL.WAL_SYNC_TIMEOUT_MS}},
usually the WAL system will abort the region server if it can not finish the
sync within 5 minutes.
In the PR, only the {{WAL.sync}} timeout in {{HRegion#doWALAppend}}
,RegionServer is always aborted. But for {{WALUtil.writeMarker}}, it is just
record the internal status, seems it is no need to always abort the
regionServer when {{WAL.sync}} timeout.
was:As HBASE-27223 said, if {{WAL.sync}} get a timeout exception, the only
correct way is to abort the region server, as the design of WAL sync, is to
succeed or die, there is no 'failure'. It is usually not a big deal is because
we set a very large default value(5 minutes) for
{{AbstractFSWAL.WAL_SYNC_TIMEOUT_MS}}, usually the WAL system will abort the
region server if it can not finish the sync within 5 minutes.
> RegionServer should be aborted when WAL.sync throws TimeoutIOException
> ----------------------------------------------------------------------
>
> Key: HBASE-27230
> URL: https://issues.apache.org/jira/browse/HBASE-27230
> Project: HBase
> Issue Type: Bug
> Components: wal
> Affects Versions: 3.0.0-alpha-4
> Reporter: chenglei
> Priority: Major
>
> As HBASE-27223 said, if {{WAL.sync}} get a timeout exception, the only
> correct way is to abort the region server, as the design of WAL sync, is to
> succeed or die, there is no 'failure'. It is usually not a big deal is
> because we set a very large default value(5 minutes) for
> {{AbstractFSWAL.WAL_SYNC_TIMEOUT_MS}}, usually the WAL system will abort the
> region server if it can not finish the sync within 5 minutes.
> In the PR, only the {{WAL.sync}} timeout in {{HRegion#doWALAppend}}
> ,RegionServer is always aborted. But for {{WALUtil.writeMarker}}, it is just
> record the internal status, seems it is no need to always abort the
> regionServer when {{WAL.sync}} timeout.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)