Alexandre Normand created AVRO-1145:
---------------------------------------
Summary: Can't read union of null and primitive from value written
with schema as primitive
Key: AVRO-1145
URL: https://issues.apache.org/jira/browse/AVRO-1145
Project: Avro
Issue Type: Bug
Components: java
Affects Versions: 1.7.0
Reporter: Alexandre Normand
Using the its Java's generic representation API and I have a problem dealing
with our current case of schema evolution. The scenario we're dealing with here
is making a primitive-type field optional by changing the field to be a
{{union}} of {{null}} and that primitive type.
I'm going to use a simple example. Basically, our schemas are:
Initial: A record with one field of type {{int}}
Second version: Same record, same field name but the type is now a union of
{{null}} and {{int}}
According to the [schema
resolution|http://avro.apache.org/docs/current/spec.html#Schema+Resolution]
chapter of Avro's spec, the resolution for such a case should be:
{code}
if reader's is a union, but writer's is not
The first schema in the reader's union that matches
the writer's schema is recursively resolved against
it. If none match, an error is signalled.
{code}
My interpretation is that we should resolve data serialized with the initial
schema properly as int is part of the union in the reader's schema.
However, when running a test of reading back a record serialized with version 1
using the version 2, I get
*{{org.apache.avro.AvroTypeException: Attempt to process a int when a union was
expected.}}*
Here's a test that shows exactly this:
{code}
@Test
public void testReadingUnionFromValueWrittenAsPrimitive() throws Exception {
Schema writerSchema = new Schema.Parser().parse("{\n" +
" \"type\":\"record\",\n" +
" \"name\":\"NeighborComparisons\",\n" +
" \"fields\": [\n" +
" {\"name\": \"test\",\n" +
" \"type\": \"int\" }]} ");
Schema readersSchema = new Schema.Parser().parse(" {\n" +
" \"type\":\"record\",\n" +
" \"name\":\"NeighborComparisons\",\n" +
" \"fields\": [ {\n" +
" \"name\": \"test\",\n" +
" \"type\": [\"null\", \"int\"],\n" +
" \"default\": null } ] }");
// Writing a record using the initial schema with the
// test field defined as an int
GenericData.Record record = new GenericData.Record(writerSchema);
record.put("test", Integer.valueOf(10));
ByteArrayOutputStream output = new ByteArrayOutputStream();
JsonEncoder jsonEncoder = EncoderFactory.get().
jsonEncoder(writerSchema, output);
GenericDatumWriter<GenericData.Record> writer = new
GenericDatumWriter<GenericData.Record>(writerSchema);
writer.write(record, jsonEncoder);
jsonEncoder.flush();
output.flush();
System.out.println(output.toString());
// We try reading it back using the second schema
// version where the test field is defined as a union of null and int
JsonDecoder jsonDecoder = DecoderFactory.get().
jsonDecoder(readersSchema, output.toString());
GenericDatumReader<GenericData.Record> reader =
new GenericDatumReader<GenericData.Record>(writerSchema,
readersSchema);
GenericData.Record read = reader.read(null, jsonDecoder);
// We should be able to assert that the value is 10 but it
// fails on reading the record before getting here
assertEquals(10, read.get("test"));
}
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira