[GitHub] [hbase] wchevreuil commented on a diff in pull request #4756: HBASE-27354 EOF thrown by WALEntryStream causes replication blocking

GitBox Tue, 06 Sep 2022 05:48:09 -0700


wchevreuil commented on code in PR #4756:
URL: https://github.com/apache/hbase/pull/4756#discussion_r963639465



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java:
##########
@@ -255,15 +255,21 @@ private void dequeueCurrentLog() throws IOException {
    * Returns whether the file is opened for writing.
    */
   private boolean readNextEntryAndRecordReaderPosition() throws IOException {
+    long prePos = reader.getPosition();

Review Comment:
   > It's not a big problem if it just throws EOF as we'll retry. The big 
problem is that if we have read some entries into the WALEntryBatch and 
increased the totalBufferUsed, and the totalBufferUsed is not subtracted after 
throwing EOF, all peers will eventually block completely.
   
   One of our customers seems to be consistently reaching this EOF problem, per 
below exception trace. 
   
   `WARN org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: 
Encountered a malformed edit, seeking back to last good position in file, from 
31912404 to 31912265
   java.io.EOFException: EOF while reading 106 WAL KVs; started reading at 
31912338 and read up to 31912404
   at 
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:397)
   at 
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98)
   at 
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:86)
   at 
org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:262)
   at 
org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:176)
   at 
org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:101)`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] wchevreuil commented on a diff in pull request #4756: HBASE-27354 EOF thrown by WALEntryStream causes replication blocking

Reply via email to