[
https://issues.apache.org/jira/browse/AVRO-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915942#comment-13915942
]
Jim Pivarski commented on AVRO-1467:
------------------------------------
I've tested aliases, and they completely work--- there's no need to open a new
ticket.
Specifically, I tested enums and fixed, with and without a namespace on the
writer, with and without an alias on the reader, with relative and
fully-qualified aliases (18 cases). Each case behaved as expected. (I must
have gotten my alias-related tests crossed while I was focusing on record
names.)
For the record, I'll post my tests here.
{code:java}
import org.apache.avro.file.DataFileReader
import org.apache.avro.file.DataFileWriter
import org.apache.avro.generic.GenericData
import org.apache.avro.generic.GenericDatumReader
import org.apache.avro.generic.GenericDatumWriter
import org.apache.avro.io.DatumReader
import org.apache.avro.io.DatumWriter
import org.apache.avro.Schema
var writerSchema = (new Schema.Parser).parse("""{"type": "enum", "name":
"Writer", "symbols": ["one", "two", "three"]}""")
var datumWriter = new GenericDatumWriter[GenericData.EnumSymbol](writerSchema)
var dataFileWriter = new DataFileWriter[GenericData.EnumSymbol](datumWriter)
dataFileWriter.create(writerSchema, new java.io.File("/tmp/test.avro"))
dataFileWriter.append(new GenericData.EnumSymbol(writerSchema, "one"))
dataFileWriter.append(new GenericData.EnumSymbol(writerSchema, "two"))
dataFileWriter.append(new GenericData.EnumSymbol(writerSchema, "three"))
dataFileWriter.close()
var readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name":
"Reader", "symbols": ["one", "two", "three"]}""")
var datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema,
readerSchema)
var dataFileReader = new DataFileReader[GenericData.EnumSymbol](new
java.io.File("/tmp/test.avro"), datumReader)
while (dataFileReader.hasNext) {
val in = dataFileReader.next()
println(in, in.getSchema)
}
// fails (good)
readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader",
"symbols": ["one", "two", "three"], "aliases": ["Writer"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema,
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new
java.io.File("/tmp/test.avro"), datumReader)
while (dataFileReader.hasNext) {
val in = dataFileReader.next()
println(in, in.getSchema)
}
// succeeds (good)
readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader",
"symbols": ["one", "two", "three"], "aliases": ["NOTWRITER"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema,
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new
java.io.File("/tmp/test.avro"), datumReader)
while (dataFileReader.hasNext) {
val in = dataFileReader.next()
println(in, in.getSchema)
}
// fails (good)
writerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Writer",
"namespace": "com.wowie", "symbols": ["one", "two", "three"]}""")
datumWriter = new GenericDatumWriter[GenericData.EnumSymbol](writerSchema)
dataFileWriter = new DataFileWriter[GenericData.EnumSymbol](datumWriter)
dataFileWriter.create(writerSchema, new java.io.File("/tmp/test2.avro"))
dataFileWriter.append(new GenericData.EnumSymbol(writerSchema, "one"))
dataFileWriter.append(new GenericData.EnumSymbol(writerSchema, "two"))
dataFileWriter.append(new GenericData.EnumSymbol(writerSchema, "three"))
dataFileWriter.close()
readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader",
"symbols": ["one", "two", "three"], "aliases": ["Writer"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema,
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new
java.io.File("/tmp/test2.avro"), datumReader)
while (dataFileReader.hasNext) {
val in = dataFileReader.next()
println(in, in.getSchema)
}
// fails (good)
readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader",
"symbols": ["one", "two", "three"], "aliases": ["com.wowie.Writer"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema,
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new
java.io.File("/tmp/test2.avro"), datumReader)
while (dataFileReader.hasNext) {
val in = dataFileReader.next()
println(in, in.getSchema)
}
// succeeds (good)
readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader",
"symbols": ["one", "two", "three"], "aliases": ["com.wowie.NOTWRITER"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema,
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new
java.io.File("/tmp/test2.avro"), datumReader)
while (dataFileReader.hasNext) {
val in = dataFileReader.next()
println(in, in.getSchema)
}
// fails (good)
readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader",
"symbols": ["one", "two", "three"], "aliases": ["com.notnamespace.Writer"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema,
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new
java.io.File("/tmp/test2.avro"), datumReader)
while (dataFileReader.hasNext) {
val in = dataFileReader.next()
println(in, in.getSchema)
}
// fails (good)
readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader",
"namespace": "com.wowie", "symbols": ["one", "two", "three"], "aliases":
["Writer"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema,
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new
java.io.File("/tmp/test2.avro"), datumReader)
while (dataFileReader.hasNext) {
val in = dataFileReader.next()
println(in, in.getSchema)
}
// succeeds (good)
readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader",
"namespace": "com.wowie", "symbols": ["one", "two", "three"], "aliases":
["com.wowie.Writer"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema,
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new
java.io.File("/tmp/test2.avro"), datumReader)
while (dataFileReader.hasNext) {
val in = dataFileReader.next()
println(in, in.getSchema)
}
// succeeds (good)
var writerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name":
"Writer", "size": 10}""")
var datumWriter2 = new GenericDatumWriter[GenericData.Fixed](writerSchema2)
var dataFileWriter2 = new DataFileWriter[GenericData.Fixed](datumWriter2)
dataFileWriter2.create(writerSchema2, new java.io.File("/tmp/test3.avro"))
dataFileWriter2.append(new GenericData.Fixed(writerSchema2,
"hellohello".getBytes))
dataFileWriter2.append(new GenericData.Fixed(writerSchema2,
"hellohello".getBytes))
dataFileWriter2.append(new GenericData.Fixed(writerSchema2,
"hellohello".getBytes))
dataFileWriter2.close()
var readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name":
"Reader", "size": 10}""")
var datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2,
readerSchema2)
var dataFileReader2 = new DataFileReader[GenericData.Fixed](new
java.io.File("/tmp/test3.avro"), datumReader2)
while (dataFileReader2.hasNext) {
val in = dataFileReader2.next()
println(in, in.getSchema)
}
// fails (good)
readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name":
"Reader", "size": 10, "aliases": ["Writer"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2,
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new
java.io.File("/tmp/test3.avro"), datumReader2)
while (dataFileReader2.hasNext) {
val in = dataFileReader2.next()
println(in, in.getSchema)
}
// succeeds (good)
readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name":
"Reader", "size": 10, "aliases": ["NOTWRITER"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2,
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new
java.io.File("/tmp/test3.avro"), datumReader2)
while (dataFileReader2.hasNext) {
val in = dataFileReader2.next()
println(in, in.getSchema)
}
// fails (good)
writerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name":
"Writer", "namespace": "com.wowie", "size": 10}""")
datumWriter2 = new GenericDatumWriter[GenericData.Fixed](writerSchema2)
dataFileWriter2 = new DataFileWriter[GenericData.Fixed](datumWriter2)
dataFileWriter2.create(writerSchema2, new java.io.File("/tmp/test4.avro"))
dataFileWriter2.append(new GenericData.Fixed(writerSchema2,
"hellohello".getBytes))
dataFileWriter2.append(new GenericData.Fixed(writerSchema2,
"hellohello".getBytes))
dataFileWriter2.append(new GenericData.Fixed(writerSchema2,
"hellohello".getBytes))
dataFileWriter2.close()
readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name":
"Reader", "size": 10, "aliases": ["Writer"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2,
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new
java.io.File("/tmp/test4.avro"), datumReader2)
while (dataFileReader2.hasNext) {
val in = dataFileReader2.next()
println(in, in.getSchema)
}
// fails (good)
readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name":
"Reader", "size": 10, "aliases": ["com.wowie.Writer"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2,
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new
java.io.File("/tmp/test4.avro"), datumReader2)
while (dataFileReader2.hasNext) {
val in = dataFileReader2.next()
println(in, in.getSchema)
}
// succeeds (good)
readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name":
"Reader", "size": 10, "aliases": ["com.wowie.NOTWRITER"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2,
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new
java.io.File("/tmp/test4.avro"), datumReader2)
while (dataFileReader2.hasNext) {
val in = dataFileReader2.next()
println(in, in.getSchema)
}
// fails (good)
readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name":
"Reader", "size": 10, "aliases": ["com.notnamespace.Writer"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2,
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new
java.io.File("/tmp/test4.avro"), datumReader2)
while (dataFileReader2.hasNext) {
val in = dataFileReader2.next()
println(in, in.getSchema)
}
// fails (good)
readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name":
"Reader", "namespace": "com.wowie", "size": 10, "aliases": ["Writer"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2,
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new
java.io.File("/tmp/test4.avro"), datumReader2)
while (dataFileReader2.hasNext) {
val in = dataFileReader2.next()
println(in, in.getSchema)
}
// succeeds (good)
readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name":
"Reader", "namespace": "com.wowie", "size": 10, "aliases":
["com.wowie.Writer"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2,
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new
java.io.File("/tmp/test4.avro"), datumReader2)
while (dataFileReader2.hasNext) {
val in = dataFileReader2.next()
println(in, in.getSchema)
}
// succeeds (good)
{code}
> Schema resolution does not check record names
> ---------------------------------------------
>
> Key: AVRO-1467
> URL: https://issues.apache.org/jira/browse/AVRO-1467
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.7.6
> Reporter: Jim Pivarski
> Fix For: 1.8.0
>
>
> According to http://avro.apache.org/docs/1.7.6/spec.html#Schema+Resolution ,
> writer and reader schemae should be considered compatible if they (1) have
> the same name and (2) the reader requests a subset of the writer's fields
> with compatible types. In the Java version, I find that the structure of the
> fields is checked but the name is _not_ checked. (It's too permissive; acts
> like a structural type check, rather than structural and nominal.)
> Here's a demonstration (in the Scala REPL to allow for experimentation;
> launch with "scala -cp avro-tools-1.7.6.jar" to get all the classes). The
> following writes a small, valid Avro data file:
> {code:java}
> import org.apache.avro.file.DataFileReader
> import org.apache.avro.file.DataFileWriter
> import org.apache.avro.generic.GenericData
> import org.apache.avro.generic.GenericDatumReader
> import org.apache.avro.generic.GenericDatumWriter
> import org.apache.avro.generic.GenericRecord
> import org.apache.avro.io.DatumReader
> import org.apache.avro.io.DatumWriter
> import org.apache.avro.Schema
> val parser = new Schema.Parser
> // The name is different but the fields are the same.
> val writerSchema = parser.parse("""{"type": "record", "name": "Writer",
> "fields": [{"name": "one", "type": "int"}, {"name": "two", "type":
> "string"}]}""")
> val readerSchema = parser.parse("""{"type": "record", "name": "Reader",
> "fields": [{"name": "one", "type": "int"}, {"name": "two", "type":
> "string"}]}""")
> def makeRecord(one: Int, two: String): GenericRecord = {
> val out = new GenericData.Record(writerSchema)
> out.put("one", one)
> out.put("two", two)
> out
> }
> val datumWriter = new GenericDatumWriter[GenericRecord](writerSchema)
> val dataFileWriter = new DataFileWriter[GenericRecord](datumWriter)
> dataFileWriter.create(writerSchema, new java.io.File("/tmp/test.avro"))
> dataFileWriter.append(makeRecord(1, "one"))
> dataFileWriter.append(makeRecord(2, "two"))
> dataFileWriter.append(makeRecord(3, "three"))
> dataFileWriter.close()
> {code}
> Looking at the output with "hexdump -C /tmp/test.avro", we see that the
> writer schema is embedded in the file, and the record's name is "Writer". To
> read it back:
> {code:java}
> val datumReader = new GenericDatumReader[GenericRecord](writerSchema,
> readerSchema)
> val dataFileReader = new DataFileReader[GenericRecord](new
> java.io.File("/tmp/test.avro"), datumReader)
> while (dataFileReader.hasNext) {
> val in = dataFileReader.next()
> println(in, in.getSchema)
> }
> {code}
> The problem is that the above is successful, even though I'm requesting a
> record with name "Reader".
> If I make structurally incompatible records, for instance by writing with
> "Writer.two" being an integer and "Reader.two" being a string, it fails to
> read with org.apache.avro.AvroTypeException (as it should). If I try the
> above test with an enum type or a fixed type, it _does_ require the writer
> and reader names to match: record is the only named type for which the name
> is ignored during schema resolution.
> We're supposed to use aliases to explicitly declare which structurally
> compatible writer-reader combinations to accept. Because of the above bug,
> differently named records are accepted regardless of their aliases, but enums
> and fixed types are not accepted, even if they have the right aliases. This
> may be a separate bug, or it may be related to the above.
> To make sure that I'm correctly understanding the specification, I tried
> exactly the same thing in the Python version:
> {code:python}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> writerSchema = avro.schema.parse('{"type": "record", "name": "Writer",
> "fields": [{"name": "one", "type": "int"}, {"name": "two", "type":
> "string"}]}')
> readerSchema = avro.schema.parse('{"type": "record", "name": "Reader",
> "fields": [{"name": "one", "type": "int"}, {"name": "two", "type":
> "string"}]}')
> writer = DataFileWriter(open("/tmp/test2.avro", "w"), DatumWriter(),
> writerSchema)
> writer.append({"one": 1, "two": "one"})
> writer.append({"one": 2, "two": "two"})
> writer.append({"one": 3, "two": "three"})
> writer.close()
> reader = DataFileReader(open("/tmp/test2.avro"), DatumReader(None,
> readerSchema))
> for datum in reader:
> print datum
> {code}
> The Python code fails in the first read with
> avro.io.SchemaResolutionException, as it is supposed to. (Interestingly,
> Python ignores the aliases as well, which I think it's not supposed to do.
> Since the Java and Python versions both have the same behavior with regard to
> aliases, I wonder if I'm understanding
> http://avro.apache.org/docs/1.7.6/spec.html#Aliases correctly.)
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)