Mengran Wang created AVRO-2882:
----------------------------------
Summary: Validate input data format before decoding it
Key: AVRO-2882
URL: https://issues.apache.org/jira/browse/AVRO-2882
Project: Apache Avro
Issue Type: Improvement
Components: java
Affects Versions: 1.9.2, 1.8.2
Reporter: Mengran Wang
Attachments: Screen Shot 2020-06-18 at 5.48.39 PM.png
When decoding a byte array using the Avro BinaryDecoder and
SpecificDatumReader, is it possible to use the schema to check whether the
input matches the definition before allocating memory buffer to process the
data?
One bug we have in production is that we defined a type of payload that
consists of two parts: the first part is a fixed size byte array and the second
part is a record of variable-length strings. During the deserialization
process, we'll extract the byte array first (using schema A) and then read out
the strings (using schema B). However, we accidentally create a malformed
payload that leaves out the byte array part. We assume Avro should throw out
some kind of RuntimeException when decoding this malformed payload, but it
ended up allocating a huge memory buffer *scratchUtf8* to read the string and
eventually cause a JVM OOM error on our end.
{code:java}
fixed MD5(16); // fixed length
record A {
MD5 hash;
}
record B {
string name1;
string name2;
union {null, string} name3 = null;
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)