[
https://issues.apache.org/jira/browse/AVRO-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jamie Olson updated AVRO-1456:
------------------------------
Description:
org.apache.avro.mapred.AvroAsTextInputFormat relies on the toString() method
rather than using org.apache.avro.generic.GenericDatumWriter.write() and
org.apache.avro.io.JsonEncoder as in org.apache.avro.tool.DataFileReadTool.
This results in a serialization of the data element, without the fully
qualified name as specified in the Avro Specifications JSON Encoding section:
http://avro.apache.org/docs/1.7.6/spec.html#json_encoding
The specification indicates that for a union type: ["null","string","Foo"],
data should be serialized with:
* null as null;
* the string "a" as {"string": "a"}; and
* a Foo instance as {"Foo": {...}}, where {...} indicates the JSON encoding of
a Foo instance.
Instead, AvroAsTextInputFormat is serializing these values as
* null as null;
* the string "a" as "a"; and
* a Foo instance as {...}, where {...} indicates the JSON encoding of a Foo
instance.
was:
org.apache.avro.mapred.AvroAsTextInputFormat relies on the toString() method
rather than using org.apache.avro.generic.GenericDatumWriter.write() and
org.apache.avro.io.JsonEncoder as in org.apache.avro.tool.DataFileReadTool.
This results in a serialization of the data element, without the fully
qualified name as specified in the Avro Specifications JSON Encoding section:
http://avro.apache.org/docs/1.7.6/spec.html#json_encoding
The specification indicates that for a union type: \["null","string","Foo"\],
data should be serialized with:
* null as null;
* the string "a" as {"string": "a"}; and
* a Foo instance as {"Foo": {...}}, where {...} indicates the JSON encoding of
a Foo instance.
Instead, AvroAsTextInputFormat is serializing these values as
* null as null;
* the string "a" as "a"; and
* a Foo instance as {...}, where {...} indicates the JSON encoding of a Foo
instance.
> AvroAsTextInputFormat is inconsistent with the Avro JSON Encoding described
> in the Avro Specification
> -----------------------------------------------------------------------------------------------------
>
> Key: AVRO-1456
> URL: https://issues.apache.org/jira/browse/AVRO-1456
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.7.6
> Reporter: Jamie Olson
>
> org.apache.avro.mapred.AvroAsTextInputFormat relies on the toString() method
> rather than using org.apache.avro.generic.GenericDatumWriter.write() and
> org.apache.avro.io.JsonEncoder as in org.apache.avro.tool.DataFileReadTool.
> This results in a serialization of the data element, without the fully
> qualified name as specified in the Avro Specifications JSON Encoding section:
> http://avro.apache.org/docs/1.7.6/spec.html#json_encoding
> The specification indicates that for a union type: ["null","string","Foo"],
> data should be serialized with:
> * null as null;
> * the string "a" as {"string": "a"}; and
> * a Foo instance as {"Foo": {...}}, where {...} indicates the JSON encoding
> of a Foo instance.
> Instead, AvroAsTextInputFormat is serializing these values as
> * null as null;
> * the string "a" as "a"; and
> * a Foo instance as {...}, where {...} indicates the JSON encoding of a Foo
> instance.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)