[
https://issues.apache.org/jira/browse/HBASE-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans updated HBASE-9373:
--------------------------------------
Attachment: 9373-v2.txt
This v2 augments Stack's patch by re-seeking to our original position if we get
partial data and then return false (basically if we get partial reads, roll
back our latest read and act as if it wasn't there).
It fixed replication for me, at least in the few tests that I ran. I also saw
the relevant logs and saw replication doing the right thing. I might saw 2-3
partial reads but then replication would finally get to see the full data.
> [replication] data loss because replication doesn't expect partial reads
> ------------------------------------------------------------------------
>
> Key: HBASE-9373
> URL: https://issues.apache.org/jira/browse/HBASE-9373
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.95.2
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
> Priority: Blocker
> Fix For: 0.98.0, 0.96.0
>
> Attachments: 9373.txt, 9373-v2.txt
>
>
> When I see this in the logs it often means we got a partial read and then we
> have the wrong offset when reading the rest of the file
> {noformat}
> 2013-08-28 23:16:07,182 ERROR
> [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617]
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while
> reading WAL, probably an unexpected EOF, ignoring
> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had
> invalid wire type.
> at
> com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
> at
> com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
> at
> com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
> at
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:686)
> at
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:644)
> at
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771)
> at
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766)
> at
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444)
> at
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218)
> at
> com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
> at
> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926)
> at
> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918)
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197)
> at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:390)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:298)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira