[
https://issues.apache.org/jira/browse/HBASE-27963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740668#comment-17740668
]
Rushabh Shah commented on HBASE-27963:
--------------------------------------
We are also seeing similar errors in our production environment. We are running
some version of 1.7 version. As a work around we restart the regionserver and
the new regionserver is able to replicate. So some in-memory data structure is
out of sync.
> Replication stuck when switch to new reader
> -------------------------------------------
>
> Key: HBASE-27963
> URL: https://issues.apache.org/jira/browse/HBASE-27963
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Affects Versions: 3.0.0-alpha-4, 2.4.17, 2.5.5
> Reporter: Xiaolin Ha
> Assignee: Xiaolin Ha
> Priority: Major
>
> After creating new reader for next WAL, it immediately seek() to the
> currentPositionOfEntry, but this position may be spill over the length of
> current WAL.
> {code:java}
> WARN
> [RpcServer.default.FPRWQ.Fifo.read.handler=101,queue=1,port=16020.replicationSource.wal-reader.XXXXXXX]
> regionserver.ReplicationSourceWALReader: Failed to read stream of
> replication entries
> java.io.EOFException: Cannot seek after EOF
> at
> org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1488)
> at
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62)
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.seekOnFs(ProtobufLogReader.java:495)
> at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.seek(ReaderBase.java:138)
> at
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.seek(WALEntryStream.java:399)
> at
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:341)
> at
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.handleFileNotFound(WALEntryStream.java:328)
> at
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:347)
> at
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:310)
> at
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:300)
> at
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:176)
> at
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:102)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.tryAdvanceStreamAndCreateWALBatch(ReplicationSourceWALReader.java:260)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:142)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)