[
https://issues.apache.org/jira/browse/HBASE-27073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683322#comment-17683322
]
Duo Zhang commented on HBASE-27073:
-----------------------------------
While testing 2.5.3RC2, I found out that, if I run this UT on a loaded machine,
it is easy to fail, the error is like this
{noformat}
2023-02-02T16:53:34,165 DEBUG
[RS_REFRESH_PEER-regionserver/zhangduo-VirtualBox:0-0.replicationSource,2.replicationSource.wal-reader.zhangduo-virtualbox%2C33915%2C1675327981383,2]
wal.ProtobufLogReader(448): Encountered a malformed edit, seeking back to last
good position in file, from 65558 to 65536
java.io.EOFException: Invalid PB, EOF? Ignoring; originalPosition=65536,
currentPosition=65558, messageSize=21, currentAvailable=434
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:383)
~[classes/:?]
at
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:104)
~[classes/:?]
at
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:92)
~[classes/:?]
at
org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:258)
~[classes/:?]
at
org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:172)
~[classes/:?]
at
org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:101)
~[classes/:?]
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.tryAdvanceStreamAndCreateWALBatch(ReplicationSourceWALReader.java:241)
~[classes/:?]
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:139)
~[classes/:?]
Caused by:
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
Protocol message contained an invalid tag (zero).
at
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:133)
~[hbase-shaded-protobuf-4.1.4.jar:4.1.4]
at
org.apache.hbase.thirdparty.com.google.protobuf.CodedInputStream$StreamDecoder.readTag(CodedInputStream.java:2122)
~[hbase-shaded-protobuf-4.1.4.jar:4.1.4]
at
org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:2778)
~[hbase-protocol-shaded-2.5.3.jar:2.5.3]
at
org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:2396)
~[hbase-protocol-shaded-2.5.3.jar:2.5.3]
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:418)
~[hbase-shaded-protobuf-4.1.4.jar:4.1.4]
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:317)
~[hbase-shaded-protobuf-4.1.4.jar:4.1.4]
at
org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.mergeFrom(ProtobufUtil.java:2564)
~[hbase-client-2.5.3.jar:2.5.3]
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:379)
~[classes/:?]
... 7 more
{noformat}
Obviously the message size is incorrect.
Will dig more.
> TestReplicationValueCompressedWAL.testMultiplePuts is flaky
> -----------------------------------------------------------
>
> Key: HBASE-27073
> URL: https://issues.apache.org/jira/browse/HBASE-27073
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.5.0
> Environment: Java version: 1.8.0_322
> OS name: "linux", version: "5.10.0-13-arm64", arch: "aarch64", family: "unix"
> Reporter: Andrew Kyle Purtell
> Priority: Minor
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4
>
>
> org.apache.hadoop.hbase.replication.regionserver.TestReplicationValueCompressedWAL.testMultiplePuts
> 
Run 1: TestReplicationValueCompressedWAL.testMultiplePuts:56 Waited too
> much time for replication
> Run 2: PASS
--
This message was sent by Atlassian Jira
(v8.20.10#820010)