[
https://issues.apache.org/jira/browse/TEZ-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440000#comment-16440000
]
Jonathan Eagles commented on TEZ-3914:
--------------------------------------
Essentially, the underlying serializer/deserializer for protobuf messages is
the CodedOutputStream/CodedInputStream. When messages are written either with
the writeDelimitedTo (read parallel is parseDelimitedFrom) or writeMessageNoTag
(parallel is readMessage), first the serialized size is written and then the
serialized message is written. In addition to this, Tez prepends the message
type to handle message dispatching. To protect itself from remote messages,
CodedInputStream has a default serialized size limit of 64 MB. When using the
parseDelimitedFrom java.io.InputStream method, new underlying deserializer
CodedInputStream is created using the default message size limit. This prevents
larger than 64MB Tez recovery messages from being properly parsed and prevents
further messages after the large message from being parsed as well since any
Exception is treated the same as EOF.
There are a few possible solutions to this.
1) Ensure all recovery messages are bounded in size less than 64 MB
2) Ensure messages larger than 64 MB can be successfully read
3) Ensure large messages are non-fatal and skip them.
This jira takes aproach 2 since recovery data is not lost. In addition, this
jira replaces the higher level writeDelimitedTo/parseDelimitedFrom with the
writeMessageNoTag/readMessage. This allows control over the maximum message
size as required. In addition, recover writes and reads are now more efficient
as serializing/deserializing streams (CodedInputStream/CodedOutputStream) are
now reused.
> Recovering a large DAG fails to size limit exceeded
> ---------------------------------------------------
>
> Key: TEZ-3914
> URL: https://issues.apache.org/jira/browse/TEZ-3914
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Priority: Major
> Attachments: TEZ-3914.001.patch, TEZ-3914.002.patch,
> TEZ-3914.003.patch
>
>
> A large message will be failed to parse and will be treated as recovery file
> EOF.
> {noformat}
> 2018-04-16 15:33:59,807 WARN [Thread-2] app.RecoveryParser
> (RecoveryParser.java:parseRecoveryData(771)) - Corrupt data found when trying
> to read next event
> com.google.protobuf.InvalidProtocolBufferException: Protocol message was too
> large. May be malicious. Use CodedInputStream.setSizeLimit() to increase
> the size limit.
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)