[
https://issues.apache.org/jira/browse/HBASE-28400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bryan Beaudreault updated HBASE-28400:
--------------------------------------
Description:
In HBASE-28390, I found a bug in our WAL compression which manifests as an
IllegalArgumentException or ArrayIndexOutOfBoundException. Even worse is that
ProtobufLogReader.readNext catches any Exception and rethrows it as an
EOFException. EOFException gets handled in a variety of ways by the readers of
WALs, and not all of them make sense for an exception that isn't really EOF.
For example, WALInputFormat catches EOFException and returns false for
nextKeyValue(), effectively skipping the rest of the WAL file but not failing
the job.
ReplicationSourceWALReader has some much more complicated handling of
EOFException.
RegionServer replayRecoveredEdits stops trying to read the WAL, which might
mean data loss.
We probably should better handle these exceptions which are not EOF exceptions.
was:
In HBASE-28390, I found a bug in our WAL compression which manifests as an
IllegalArgumentException or ArrayIndexOutOfBoundException. Even worse is that
ProtobufLogReader.readNext catches any Exception and rethrows it as an
EOFException. EOFException gets handled in a variety of ways by the readers of
WALs, and not all of them make sense for an exception that isn't really EOF.
For example, WALInputFormat catches EOFException and returns false for
nextKeyValue(), effectively skipping the rest of the WAL file but not failing
the job.
ReplicationSourceWALReader has some much more complicated handling of
EOFException.
> WAL readers treat any exception as EOFException, which can lead to data loss
> ----------------------------------------------------------------------------
>
> Key: HBASE-28400
> URL: https://issues.apache.org/jira/browse/HBASE-28400
> Project: HBase
> Issue Type: Bug
> Reporter: Bryan Beaudreault
> Priority: Major
>
> In HBASE-28390, I found a bug in our WAL compression which manifests as an
> IllegalArgumentException or ArrayIndexOutOfBoundException. Even worse is that
> ProtobufLogReader.readNext catches any Exception and rethrows it as an
> EOFException. EOFException gets handled in a variety of ways by the readers
> of WALs, and not all of them make sense for an exception that isn't really
> EOF.
> For example, WALInputFormat catches EOFException and returns false for
> nextKeyValue(), effectively skipping the rest of the WAL file but not failing
> the job.
> ReplicationSourceWALReader has some much more complicated handling of
> EOFException.
> RegionServer replayRecoveredEdits stops trying to read the WAL, which might
> mean data loss.
> We probably should better handle these exceptions which are not EOF
> exceptions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)