[ 
https://issues.apache.org/jira/browse/AVRO-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889639#comment-16889639
 ] 

Stan Rosenberg commented on AVRO-2478:
--------------------------------------

The suggested fix replaced RecordBuilderBase.validate with the following,
{code:java}
/**
 * @param s
 *            -- avro schema
 * @return -- true iff s denotes a nullable union
 */
public static boolean isNullableUnion(Schema s) {
    if (s.getType() != Schema.Type.UNION) {
        return false;
    }
    for (Schema t : s.getTypes()) {
        if (t.getType() == Schema.Type.NULL) {
            return true;
        }
    }
    return false;
}

/**
 * Validates that the value is compatible with the nullability of field's type.
 * There are 3 cases to consider,
 *
 * 1. value == null must imply field's type is a nullable union
 * 2. value instanceof Map must imply field's type is a Map of nullable union
 * 3. value instanceof List must imply field's type is an Array of nullable 
union
 *
 * NOTE: this implementation is intended to replace RecordBuilderBase.validate 
which contains bugs; see,
 *
 * AVRO-2478, AVRO-1927, AVRO-1716
 */
public static void validateNullable(Schema.Field field, Object value) {
    Schema schema = field.schema();

    if (value == null) {
        // verify field's type is nullable
        if (!isNullableUnion(schema)) {
            throw new AvroRuntimeException("Field " + field + " does not accept 
null values");
        }
    } else if (value instanceof Map) {
        if (!isNullableUnion(schema.getValueType()) && ((Map) 
value).values().contains(null)) {
            throw new AvroRuntimeException("Field " + field + " does not accept 
null Map value");
        }
    } else if (value instanceof List) {
        if (!isNullableUnion(schema.getElementType()) && 
((List)value).contains(null)) {
            throw new AvroRuntimeException("Field " + field + " does not accept 
null Array element");
        }
    }
}{code}
There are a couple of nuances,
 * in composite case, linear scan of the collection incurs a performance penalty
 * verifying nullability of nested collections (e.g., array<array<union \{null, 
string}>>>) is intentionally left out since it incurs an even higher 
performance penalty

Since Builder is intended to ensure that only well-defined records are built, 
the added penalty for linear (collection) scan is justified.  Furthermore, 
considering that a large percentage of use-cases involves either an un-nested 
Array or Map, the above would catch a majority of NPEs.  Conceivably, a full 
traversal of a nested collection could be performed if a Builder is explicitly 
instantiated with that feature flag, e.g., 'fullNullabilityCheck', thus 
addressing the remaining use-cases.

> RecordBuilderBase.validate doesn't check nullability of composite types
> -----------------------------------------------------------------------
>
>                 Key: AVRO-2478
>                 URL: https://issues.apache.org/jira/browse/AVRO-2478
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.9.0, 1.8.2
>            Reporter: Stan Rosenberg
>            Priority: Minor
>
> Builder classes generated from record.vm make use of 
> RecordBuilderBase.validate in every setter to ensure that the passed value 
> (primitive or composite) is compatible with the nullability of the 
> corresponding Avro schema type.  The problem is the validation doesn't extend 
> to composite types, namely Maps and Arrays.  This breaks the invariant that 
> any record constructed via Builder is well-defined (i.e., can be serialized).
> E.g., field of type array<string> would yield the following setter,
> {code:java}
> x.y.z.Builder setXXX(java.util.List<java.lang.String> value){code}
> Thus, a collection with a null element will result in a record which breaks 
> serialization, e.g.,
> {code:java}
> datumWriter.write(builder.setXXX(Collections.singletonList(null)).build()){code}
> will throw NPE.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to