[GitHub] [hbase] ddupg commented on a diff in pull request #4756: HBASE-27354 EOF thrown by WALEntryStream causes replication blocking

GitBox Sun, 04 Sep 2022 21:01:40 -0700


ddupg commented on code in PR #4756:
URL: https://github.com/apache/hbase/pull/4756#discussion_r962458619



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java:
##########
@@ -255,15 +255,21 @@ private void dequeueCurrentLog() throws IOException {
    * Returns whether the file is opened for writing.
    */
   private boolean readNextEntryAndRecordReaderPosition() throws IOException {
+    long prePos = reader.getPosition();

Review Comment:
   Thanks for reviewing.
   I haven't found out yet that we read an incomplete entry. EOFException is 
thrown by `org.apache.hadoop.fs.FSDataInputStream#seek`.
   After reading the uncommitted data, we will reopen `FSDataInputStream` and 
seek back. If the position where seek back is the sync length, it may cause 
EOF. As the 3 copies of the hdfs block being written are inconsistent, even if 
we can read the data from a specific position in one copy, it may not be read 
in the other one.
   It's not a big problem if it just throws EOF as we'll retry. The big problem 
is that if we have read some entries into the `WALEntryBatch` and increased the 
[`totalBufferUsed`](https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L78),
 and the `totalBufferUsed` is not subtracted after throwing EOF, all peers will 
eventually block completely.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] ddupg commented on a diff in pull request #4756: HBASE-27354 EOF thrown by WALEntryStream causes replication blocking

Reply via email to