[ 
https://issues.apache.org/jira/browse/HBASE-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-9373:
--------------------------------------

    Attachment: 9373-v2.txt

This v2 augments Stack's patch by re-seeking to our original position if we get 
partial data and then return false (basically if we get partial reads, roll 
back our latest read and act as if it wasn't there).

It fixed replication for me, at least in the few tests that I ran. I also saw 
the relevant logs and saw replication doing the right thing. I might saw 2-3 
partial reads but then replication would finally get to see the full data.
                
> [replication] data loss because replication doesn't expect partial reads
> ------------------------------------------------------------------------
>
>                 Key: HBASE-9373
>                 URL: https://issues.apache.org/jira/browse/HBASE-9373
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.95.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.98.0, 0.96.0
>
>         Attachments: 9373.txt, 9373-v2.txt
>
>
> When I see this in the logs it often means we got a partial read and then we 
> have the wrong offset when reading the rest of the file
> {noformat}
> 2013-08-28 23:16:07,182 ERROR 
> [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617]
>  org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while 
> reading WAL, probably an unexpected EOF, ignoring
> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had 
> invalid wire type.
>         at 
> com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
>         at 
> com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
>         at 
> com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:686)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:644)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218)
>         at 
> com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220)
>         at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)
>         at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
>         at 
> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290)
>         at 
> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926)
>         at 
> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296)
>         at 
> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:390)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:298)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to