[
https://issues.apache.org/jira/browse/AVRO-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485597#comment-14485597
]
Ryan Blue commented on AVRO-1661:
---------------------------------
Here's a bit of code that demonstrates how you should be reading those events:
{code:java}
Schema eventSchema = schema(event);
Decoder decoder = DecoderFactory.get().binaryDecoder(event.getBody(), decoder);
DatumReader<GenericRecord> reader = new
GenericDatumReader<GenericRecord>(eventSchema, newSchema);
return reader.read(reuse, decoder);
{code}
This is from the [Flume
DatasetSink|https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/AvroParser.java#L156],
which has the same problem of coordinating schema versions. Flume uses an
event header to store a URL to where the schema is stored in HDFS and keeps a
cache of readers for each incoming schema. Hopefully that class is helpful.
> Schema Evolution not working
> -----------------------------
>
> Key: AVRO-1661
> URL: https://issues.apache.org/jira/browse/AVRO-1661
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.7.6, 1.7.7
> Environment: Ubuntu 14.10
> Reporter: Nicolas PHUNG
> Priority: Critical
> Labels: avsc, evolution, schema
>
> This is the Avro Schema (OLD) I was using to write Avro binary data before:
> {noformat}
> {
> "namespace": "com.hello.world",
> "type": "record",
> "name": "Toto",
> "fields": [
> {
> "name": "a",
> "type": [
> "string",
> "null"
> ]
> },
> {
> "name": "b",
> "type": "string"
> }
> ]
> }
> {noformat}
> This is the Avro Schema (NEW) I'm using to read the Avro binary data :
> {noformat}
> {
> "namespace": "com.hello.world",
> "type": "record",
> "name": "Toto",
> "fields": [
> {
> "name": "a",
> "type": [
> "string",
> "null"
> ]
> },
> {
> "name": "b",
> "type": "string"
> },
> {
> "name": "c",
> "type": "string",
> "default": "na"
> }
> ]
> }
> {noformat}
> However, I can't read the old data with the new Schema. I've got the
> following errors :
> {noformat}
> 15/04/08 17:32:22 ERROR executor.Executor: Exception in task 0.0 in stage 3.0
> (TID 3)
> java.io.EOFException
> at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
> at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
> at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259)
> at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:272)
> at
> org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:113)
> at
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:353)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:157)
> at
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at com.miguno.kafka.avro.AvroDecoder.fromBytes(AvroDecoder.scala:31)
> {noformat}
> From my understanding, I should be able to read the old data with the new
> schema that contains a new field with a default value. But it doesn't seem to
> work. Am I doing something wrong ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)