[ 
https://issues.apache.org/jira/browse/HBASE-27644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-27644.
-------------------------------
    Fix Version/s: 2.6.0
                   3.0.0-alpha-4
                   2.4.17
                   2.5.4
     Hadoop Flags: Reviewed
       Resolution: Fixed

Pushed to branch-2.4+.

Thanks [~vjasani] for reviewing!

> Should not return false when WALKey has no following KVs while reading WAL 
> file
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-27644
>                 URL: https://issues.apache.org/jira/browse/HBASE-27644
>             Project: HBase
>          Issue Type: Bug
>          Components: dataloss, wal
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Critical
>             Fix For: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.4
>
>
> In the current implementation
> {code}
>       if (!walKey.hasFollowingKvCount() || 0 == walKey.getFollowingKvCount()) 
> {
>         LOG.trace("WALKey has no KVs that follow it; trying the next one. 
> current offset={}",
>           this.inputStream.getPos());
>         seekOnFs(originalPosition);
>         return false;
>       }
> {code}
> Here we just return false, seek back to the original position. I think the 
> intention here is that it means the data is not available yet and we should 
> try to read them next time.
> But this class is not only used for replication, it is also used by 
> splitting, return false will make the reader.next return null, and 
> WALSplitter will think the WAL file has been fully read and complete the 
> splitting task. If there are still other WAL entries in the file, we will 
> miss reading them and cause data loss.
> And in fact, the following kv count is a field in a pb message, so it is 
> impossible that now it is 0 but later it will become a value greater than 0, 
> as we use writeDelimited to write the message, there is a size in front of 
> the message, if we read it successfully, we can make sure the message is 
> complete/ So seeking back in replication is also an useless operation.
> So here we propose we still need to return true here, so the upper layer are 
> free to skip or not, but they still need to read other entries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to