Sun Xin created HBASE-27354:
-------------------------------

             Summary: EOF thrown by WALEntryStream causes replication blocking
                 Key: HBASE-27354
                 URL: https://issues.apache.org/jira/browse/HBASE-27354
             Project: HBase
          Issue Type: Bug
          Components: Replication
    Affects Versions: 2.4.14, 3.0.0-alpha-3, 2.5.0, 2.6.0
            Reporter: Sun Xin
            Assignee: Sun Xin


In 
[WALEntryStream#readNextEntryAndRecordReaderPosition|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L257],
 it is possible that we read uncommitted data.  If we read beyond the committed 
file length, then reopen the 

inputStream and seek back.

In our use, we found that the position where seek back may be exactly the 
length of the file  being written, which may cause EOF.

The thrown EOF is finally caught 
[ReplicationSourceWALReader.run|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L158],
 but 
[totalBufferUsed|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L78]
 is not cleanup up.

After a long run, all peers will go slow and eventually block completely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to