[
https://issues.apache.org/jira/browse/HBASE-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754030#comment-13754030
]
stack commented on HBASE-9373:
------------------------------
Looking, it looks like corruption given the pb code -- see below. Odd is that
we recover on subsequent read (JD says that replication will reopen the file if
it does not get a record because of failed parse). Let me add a patch w/ more
detail around what is going on in here.
{code}
30 /**
29 * Parse a single field from {@code input} and merge it into this set.
28 * @param tag The field's tag number, which was already parsed.
27 * @return {@code false} if the tag is an end group tag.
26 */
25 public boolean mergeFieldFrom(final int tag, final CodedInputStream
input)
24 throws IOException {
23 final int number = WireFormat.getTagFieldNumber(tag);
22 switch (WireFormat.getTagWireType(tag)) {
21 case WireFormat.WIRETYPE_VARINT:
20 getFieldBuilder(number).addVarint(input.readInt64());
19 return true;
18 case WireFormat.WIRETYPE_FIXED64:
17 getFieldBuilder(number).addFixed64(input.readFixed64());
16 return true;
15 case WireFormat.WIRETYPE_LENGTH_DELIMITED:
14 getFieldBuilder(number).addLengthDelimited(input.readBytes());
13 return true;
12 case WireFormat.WIRETYPE_START_GROUP:
11 final Builder subBuilder = newBuilder();
10 input.readGroup(number, subBuilder,
9 ExtensionRegistry.getEmptyRegistry());
8 getFieldBuilder(number).addGroup(subBuilder.build());
7 return true;
6 case WireFormat.WIRETYPE_END_GROUP:
5 return false;
4 case WireFormat.WIRETYPE_FIXED32:
3 getFieldBuilder(number).addFixed32(input.readFixed32());
2 return true;
1 default:
0 throw InvalidProtocolBufferException.invalidWireType();
1 }
2 }
{code}
> Fix more log spam in replication for 0.96.0
> -------------------------------------------
>
> Key: HBASE-9373
> URL: https://issues.apache.org/jira/browse/HBASE-9373
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.95.2
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
> Fix For: 0.98.0, 0.96.0
>
>
> Two things that are bugging me.
> First this one where we try to be more responsive now and only sleep 1 second
> if we didn't get data. Let's set it down to TRACE.
> bq. 2013-08-28 23:17:47,421 DEBUG [regionserver60020.replicationSource,1]
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Nothing
> to replicate, sleeping 1000 times 1
> Then I've seen cases where we can hit an EOF and instead of just being silent
> we hit this:
> {noformat}
> 2013-08-28 23:16:07,182 ERROR
> [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617]
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while
> reading WAL, probably an unexpected EOF, ignoring
> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had
> invalid wire type.
> at
> com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
> at
> com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
> at
> com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
> at
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:686)
> at
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:644)
> at
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771)
> at
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766)
> at
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444)
> at
> org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218)
> at
> com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
> at
> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926)
> at
> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918)
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197)
> at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:390)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:298)
> {noformat}
> The problem here is it shows up as an ERROR, so the intention is that there
> really could be a problem? Or would it manifest itself in some other way
> anyway if we silence this exception? [~stack]? FWIW I verified that I had all
> my data.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira