[ 
https://issues.apache.org/jira/browse/HBASE-28400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault updated HBASE-28400:
--------------------------------------
    Description: 
In HBASE-28390, I found a bug in our WAL compression which manifests as an 
IllegalArgumentException or ArrayIndexOutOfBoundException. Even worse is that 
ProtobufLogReader.readNext catches any Exception and rethrows it as an 
EOFException. EOFException gets handled in a variety of ways by the readers of 
WALs, and not all of them make sense for an exception that isn't really EOF.

For example, WALInputFormat catches EOFException and returns false for 
nextKeyValue(), effectively skipping the rest of the WAL file but not failing 
the job.

ReplicationSourceWALReader has some much more complicated handling of 
EOFException.

RegionServer replayRecoveredEdits stops trying to read the WAL, which might 
mean data loss.

We probably should better handle these exceptions which are not EOF exceptions.

  was:
In HBASE-28390, I found a bug in our WAL compression which manifests as an 
IllegalArgumentException or ArrayIndexOutOfBoundException. Even worse is that 
ProtobufLogReader.readNext catches any Exception and rethrows it as an 
EOFException. EOFException gets handled in a variety of ways by the readers of 
WALs, and not all of them make sense for an exception that isn't really EOF.

For example, WALInputFormat catches EOFException and returns false for 
nextKeyValue(), effectively skipping the rest of the WAL file but not failing 
the job.

ReplicationSourceWALReader has some much more complicated handling of 
EOFException.


> WAL readers treat any exception as EOFException, which can lead to data loss
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-28400
>                 URL: https://issues.apache.org/jira/browse/HBASE-28400
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Bryan Beaudreault
>            Priority: Major
>
> In HBASE-28390, I found a bug in our WAL compression which manifests as an 
> IllegalArgumentException or ArrayIndexOutOfBoundException. Even worse is that 
> ProtobufLogReader.readNext catches any Exception and rethrows it as an 
> EOFException. EOFException gets handled in a variety of ways by the readers 
> of WALs, and not all of them make sense for an exception that isn't really 
> EOF.
> For example, WALInputFormat catches EOFException and returns false for 
> nextKeyValue(), effectively skipping the rest of the WAL file but not failing 
> the job.
> ReplicationSourceWALReader has some much more complicated handling of 
> EOFException.
> RegionServer replayRecoveredEdits stops trying to read the WAL, which might 
> mean data loss.
> We probably should better handle these exceptions which are not EOF 
> exceptions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to