[
https://issues.apache.org/jira/browse/AVRO-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889639#comment-16889639
]
Stan Rosenberg commented on AVRO-2478:
--------------------------------------
The suggested fix replaced RecordBuilderBase.validate with the following,
{code:java}
/**
* @param s
* -- avro schema
* @return -- true iff s denotes a nullable union
*/
public static boolean isNullableUnion(Schema s) {
if (s.getType() != Schema.Type.UNION) {
return false;
}
for (Schema t : s.getTypes()) {
if (t.getType() == Schema.Type.NULL) {
return true;
}
}
return false;
}
/**
* Validates that the value is compatible with the nullability of field's type.
* There are 3 cases to consider,
*
* 1. value == null must imply field's type is a nullable union
* 2. value instanceof Map must imply field's type is a Map of nullable union
* 3. value instanceof List must imply field's type is an Array of nullable
union
*
* NOTE: this implementation is intended to replace RecordBuilderBase.validate
which contains bugs; see,
*
* AVRO-2478, AVRO-1927, AVRO-1716
*/
public static void validateNullable(Schema.Field field, Object value) {
Schema schema = field.schema();
if (value == null) {
// verify field's type is nullable
if (!isNullableUnion(schema)) {
throw new AvroRuntimeException("Field " + field + " does not accept
null values");
}
} else if (value instanceof Map) {
if (!isNullableUnion(schema.getValueType()) && ((Map)
value).values().contains(null)) {
throw new AvroRuntimeException("Field " + field + " does not accept
null Map value");
}
} else if (value instanceof List) {
if (!isNullableUnion(schema.getElementType()) &&
((List)value).contains(null)) {
throw new AvroRuntimeException("Field " + field + " does not accept
null Array element");
}
}
}{code}
There are a couple of nuances,
* in composite case, linear scan of the collection incurs a performance penalty
* verifying nullability of nested collections (e.g., array<array<union \{null,
string}>>>) is intentionally left out since it incurs an even higher
performance penalty
Since Builder is intended to ensure that only well-defined records are built,
the added penalty for linear (collection) scan is justified. Furthermore,
considering that a large percentage of use-cases involves either an un-nested
Array or Map, the above would catch a majority of NPEs. Conceivably, a full
traversal of a nested collection could be performed if a Builder is explicitly
instantiated with that feature flag, e.g., 'fullNullabilityCheck', thus
addressing the remaining use-cases.
> RecordBuilderBase.validate doesn't check nullability of composite types
> -----------------------------------------------------------------------
>
> Key: AVRO-2478
> URL: https://issues.apache.org/jira/browse/AVRO-2478
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.9.0, 1.8.2
> Reporter: Stan Rosenberg
> Priority: Minor
>
> Builder classes generated from record.vm make use of
> RecordBuilderBase.validate in every setter to ensure that the passed value
> (primitive or composite) is compatible with the nullability of the
> corresponding Avro schema type. The problem is the validation doesn't extend
> to composite types, namely Maps and Arrays. This breaks the invariant that
> any record constructed via Builder is well-defined (i.e., can be serialized).
> E.g., field of type array<string> would yield the following setter,
> {code:java}
> x.y.z.Builder setXXX(java.util.List<java.lang.String> value){code}
> Thus, a collection with a null element will result in a record which breaks
> serialization, e.g.,
> {code:java}
> datumWriter.write(builder.setXXX(Collections.singletonList(null)).build()){code}
> will throw NPE.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)