[ 
https://issues.apache.org/jira/browse/HBASE-27073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683322#comment-17683322
 ] 

Duo Zhang commented on HBASE-27073:
-----------------------------------

While testing 2.5.3RC2, I found out that, if I run this UT on a loaded machine, 
it is easy to fail, the error is like this

{noformat}
2023-02-02T16:53:34,165 DEBUG 
[RS_REFRESH_PEER-regionserver/zhangduo-VirtualBox:0-0.replicationSource,2.replicationSource.wal-reader.zhangduo-virtualbox%2C33915%2C1675327981383,2]
 wal.ProtobufLogReader(448): Encountered a malformed edit, seeking back to last 
good position in file, from 65558 to 65536
java.io.EOFException: Invalid PB, EOF? Ignoring; originalPosition=65536, 
currentPosition=65558, messageSize=21, currentAvailable=434
        at 
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:383)
 ~[classes/:?]
        at 
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:104) 
~[classes/:?]
        at 
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:92) 
~[classes/:?]
        at 
org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:258)
 ~[classes/:?]
        at 
org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:172)
 ~[classes/:?]
        at 
org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:101)
 ~[classes/:?]
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.tryAdvanceStreamAndCreateWALBatch(ReplicationSourceWALReader.java:241)
 ~[classes/:?]
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:139)
 ~[classes/:?]
Caused by: 
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Protocol message contained an invalid tag (zero).
        at 
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:133)
 ~[hbase-shaded-protobuf-4.1.4.jar:4.1.4]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.CodedInputStream$StreamDecoder.readTag(CodedInputStream.java:2122)
 ~[hbase-shaded-protobuf-4.1.4.jar:4.1.4]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:2778)
 ~[hbase-protocol-shaded-2.5.3.jar:2.5.3]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:2396)
 ~[hbase-protocol-shaded-2.5.3.jar:2.5.3]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:418)
 ~[hbase-shaded-protobuf-4.1.4.jar:4.1.4]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:317)
 ~[hbase-shaded-protobuf-4.1.4.jar:4.1.4]
        at 
org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.mergeFrom(ProtobufUtil.java:2564)
 ~[hbase-client-2.5.3.jar:2.5.3]
        at 
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:379)
 ~[classes/:?]
        ... 7 more
{noformat}

Obviously the message size is incorrect.

Will dig more.

> TestReplicationValueCompressedWAL.testMultiplePuts is flaky
> -----------------------------------------------------------
>
>                 Key: HBASE-27073
>                 URL: https://issues.apache.org/jira/browse/HBASE-27073
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.5.0
>         Environment: Java version: 1.8.0_322
> OS name: "linux", version: "5.10.0-13-arm64", arch: "aarch64", family: "unix"
>            Reporter: Andrew Kyle Purtell
>            Priority: Minor
>             Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4
>
>
> org.apache.hadoop.hbase.replication.regionserver.TestReplicationValueCompressedWAL.testMultiplePuts
>   
Run 1: TestReplicationValueCompressedWAL.testMultiplePuts:56 Waited too 
> much time for replication
>   Run 2: PASS



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to