[ 
https://issues.apache.org/jira/browse/AVRO-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915942#comment-13915942
 ] 

Jim Pivarski commented on AVRO-1467:
------------------------------------

I've tested aliases, and they completely work--- there's no need to open a new 
ticket.

Specifically, I tested enums and fixed, with and without a namespace on the 
writer, with and without an alias on the reader, with relative and 
fully-qualified aliases (18 cases).  Each case behaved as expected.  (I must 
have gotten my alias-related tests crossed while I was focusing on record 
names.)

For the record, I'll post my tests here.
{code:java}
import org.apache.avro.file.DataFileReader
import org.apache.avro.file.DataFileWriter
import org.apache.avro.generic.GenericData
import org.apache.avro.generic.GenericDatumReader
import org.apache.avro.generic.GenericDatumWriter
import org.apache.avro.io.DatumReader
import org.apache.avro.io.DatumWriter
import org.apache.avro.Schema

var writerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": 
"Writer", "symbols": ["one", "two", "three"]}""")
var datumWriter = new GenericDatumWriter[GenericData.EnumSymbol](writerSchema)
var dataFileWriter = new DataFileWriter[GenericData.EnumSymbol](datumWriter)
dataFileWriter.create(writerSchema, new java.io.File("/tmp/test.avro"))
dataFileWriter.append(new GenericData.EnumSymbol(writerSchema, "one"))
dataFileWriter.append(new GenericData.EnumSymbol(writerSchema, "two"))
dataFileWriter.append(new GenericData.EnumSymbol(writerSchema, "three"))
dataFileWriter.close()

var readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": 
"Reader", "symbols": ["one", "two", "three"]}""")
var datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema, 
readerSchema)
var dataFileReader = new DataFileReader[GenericData.EnumSymbol](new 
java.io.File("/tmp/test.avro"), datumReader)
while (dataFileReader.hasNext) {
  val in = dataFileReader.next()
  println(in, in.getSchema)
}
// fails (good)

readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader", 
"symbols": ["one", "two", "three"], "aliases": ["Writer"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema, 
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new 
java.io.File("/tmp/test.avro"), datumReader)
while (dataFileReader.hasNext) {
  val in = dataFileReader.next()
  println(in, in.getSchema)
}
// succeeds (good)

readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader", 
"symbols": ["one", "two", "three"], "aliases": ["NOTWRITER"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema, 
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new 
java.io.File("/tmp/test.avro"), datumReader)
while (dataFileReader.hasNext) {
  val in = dataFileReader.next()
  println(in, in.getSchema)
}
// fails (good)

writerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Writer", 
"namespace": "com.wowie", "symbols": ["one", "two", "three"]}""")
datumWriter = new GenericDatumWriter[GenericData.EnumSymbol](writerSchema)
dataFileWriter = new DataFileWriter[GenericData.EnumSymbol](datumWriter)
dataFileWriter.create(writerSchema, new java.io.File("/tmp/test2.avro"))
dataFileWriter.append(new GenericData.EnumSymbol(writerSchema, "one"))
dataFileWriter.append(new GenericData.EnumSymbol(writerSchema, "two"))
dataFileWriter.append(new GenericData.EnumSymbol(writerSchema, "three"))
dataFileWriter.close()

readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader", 
"symbols": ["one", "two", "three"], "aliases": ["Writer"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema, 
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new 
java.io.File("/tmp/test2.avro"), datumReader)
while (dataFileReader.hasNext) {
  val in = dataFileReader.next()
  println(in, in.getSchema)
}
// fails (good)

readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader", 
"symbols": ["one", "two", "three"], "aliases": ["com.wowie.Writer"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema, 
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new 
java.io.File("/tmp/test2.avro"), datumReader)
while (dataFileReader.hasNext) {
  val in = dataFileReader.next()
  println(in, in.getSchema)
}
// succeeds (good)

readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader", 
"symbols": ["one", "two", "three"], "aliases": ["com.wowie.NOTWRITER"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema, 
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new 
java.io.File("/tmp/test2.avro"), datumReader)
while (dataFileReader.hasNext) {
  val in = dataFileReader.next()
  println(in, in.getSchema)
}
// fails (good)

readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader", 
"symbols": ["one", "two", "three"], "aliases": ["com.notnamespace.Writer"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema, 
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new 
java.io.File("/tmp/test2.avro"), datumReader)
while (dataFileReader.hasNext) {
  val in = dataFileReader.next()
  println(in, in.getSchema)
}
// fails (good)

readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader", 
"namespace": "com.wowie", "symbols": ["one", "two", "three"], "aliases": 
["Writer"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema, 
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new 
java.io.File("/tmp/test2.avro"), datumReader)
while (dataFileReader.hasNext) {
  val in = dataFileReader.next()
  println(in, in.getSchema)
}
// succeeds (good)

readerSchema = (new Schema.Parser).parse("""{"type": "enum", "name": "Reader", 
"namespace": "com.wowie", "symbols": ["one", "two", "three"], "aliases": 
["com.wowie.Writer"]}""")
datumReader = new GenericDatumReader[GenericData.EnumSymbol](writerSchema, 
readerSchema)
dataFileReader = new DataFileReader[GenericData.EnumSymbol](new 
java.io.File("/tmp/test2.avro"), datumReader)
while (dataFileReader.hasNext) {
  val in = dataFileReader.next()
  println(in, in.getSchema)
}
// succeeds (good)

var writerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name": 
"Writer", "size": 10}""")
var datumWriter2 = new GenericDatumWriter[GenericData.Fixed](writerSchema2)
var dataFileWriter2 = new DataFileWriter[GenericData.Fixed](datumWriter2)
dataFileWriter2.create(writerSchema2, new java.io.File("/tmp/test3.avro"))
dataFileWriter2.append(new GenericData.Fixed(writerSchema2, 
"hellohello".getBytes))
dataFileWriter2.append(new GenericData.Fixed(writerSchema2, 
"hellohello".getBytes))
dataFileWriter2.append(new GenericData.Fixed(writerSchema2, 
"hellohello".getBytes))
dataFileWriter2.close()

var readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name": 
"Reader", "size": 10}""")
var datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2, 
readerSchema2)
var dataFileReader2 = new DataFileReader[GenericData.Fixed](new 
java.io.File("/tmp/test3.avro"), datumReader2)
while (dataFileReader2.hasNext) {
  val in = dataFileReader2.next()
  println(in, in.getSchema)
}
// fails (good)

readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name": 
"Reader", "size": 10, "aliases": ["Writer"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2, 
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new 
java.io.File("/tmp/test3.avro"), datumReader2)
while (dataFileReader2.hasNext) {
  val in = dataFileReader2.next()
  println(in, in.getSchema)
}
// succeeds (good)

readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name": 
"Reader", "size": 10, "aliases": ["NOTWRITER"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2, 
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new 
java.io.File("/tmp/test3.avro"), datumReader2)
while (dataFileReader2.hasNext) {
  val in = dataFileReader2.next()
  println(in, in.getSchema)
}
// fails (good)

writerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name": 
"Writer", "namespace": "com.wowie", "size": 10}""")
datumWriter2 = new GenericDatumWriter[GenericData.Fixed](writerSchema2)
dataFileWriter2 = new DataFileWriter[GenericData.Fixed](datumWriter2)
dataFileWriter2.create(writerSchema2, new java.io.File("/tmp/test4.avro"))
dataFileWriter2.append(new GenericData.Fixed(writerSchema2, 
"hellohello".getBytes))
dataFileWriter2.append(new GenericData.Fixed(writerSchema2, 
"hellohello".getBytes))
dataFileWriter2.append(new GenericData.Fixed(writerSchema2, 
"hellohello".getBytes))
dataFileWriter2.close()

readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name": 
"Reader", "size": 10, "aliases": ["Writer"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2, 
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new 
java.io.File("/tmp/test4.avro"), datumReader2)
while (dataFileReader2.hasNext) {
  val in = dataFileReader2.next()
  println(in, in.getSchema)
}
// fails (good)

readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name": 
"Reader", "size": 10, "aliases": ["com.wowie.Writer"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2, 
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new 
java.io.File("/tmp/test4.avro"), datumReader2)
while (dataFileReader2.hasNext) {
  val in = dataFileReader2.next()
  println(in, in.getSchema)
}
// succeeds (good)

readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name": 
"Reader", "size": 10, "aliases": ["com.wowie.NOTWRITER"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2, 
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new 
java.io.File("/tmp/test4.avro"), datumReader2)
while (dataFileReader2.hasNext) {
  val in = dataFileReader2.next()
  println(in, in.getSchema)
}
// fails (good)

readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name": 
"Reader", "size": 10, "aliases": ["com.notnamespace.Writer"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2, 
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new 
java.io.File("/tmp/test4.avro"), datumReader2)
while (dataFileReader2.hasNext) {
  val in = dataFileReader2.next()
  println(in, in.getSchema)
}
// fails (good)

readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name": 
"Reader", "namespace": "com.wowie", "size": 10, "aliases": ["Writer"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2, 
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new 
java.io.File("/tmp/test4.avro"), datumReader2)
while (dataFileReader2.hasNext) {
  val in = dataFileReader2.next()
  println(in, in.getSchema)
}
// succeeds (good)

readerSchema2 = (new Schema.Parser).parse("""{"type": "fixed", "name": 
"Reader", "namespace": "com.wowie", "size": 10, "aliases": 
["com.wowie.Writer"]}""")
datumReader2 = new GenericDatumReader[GenericData.Fixed](writerSchema2, 
readerSchema2)
dataFileReader2 = new DataFileReader[GenericData.Fixed](new 
java.io.File("/tmp/test4.avro"), datumReader2)
while (dataFileReader2.hasNext) {
  val in = dataFileReader2.next()
  println(in, in.getSchema)
}
// succeeds (good)
{code}


> Schema resolution does not check record names
> ---------------------------------------------
>
>                 Key: AVRO-1467
>                 URL: https://issues.apache.org/jira/browse/AVRO-1467
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.6
>            Reporter: Jim Pivarski
>             Fix For: 1.8.0
>
>
> According to http://avro.apache.org/docs/1.7.6/spec.html#Schema+Resolution , 
> writer and reader schemae should be considered compatible if they (1) have 
> the same name and (2) the reader requests a subset of the writer's fields 
> with compatible types.  In the Java version, I find that the structure of the 
> fields is checked but the name is _not_ checked.  (It's too permissive; acts 
> like a structural type check, rather than structural and nominal.)
> Here's a demonstration (in the Scala REPL to allow for experimentation; 
> launch with "scala -cp avro-tools-1.7.6.jar" to get all the classes).  The 
> following writes a small, valid Avro data file:
> {code:java}
> import org.apache.avro.file.DataFileReader
> import org.apache.avro.file.DataFileWriter
> import org.apache.avro.generic.GenericData
> import org.apache.avro.generic.GenericDatumReader
> import org.apache.avro.generic.GenericDatumWriter
> import org.apache.avro.generic.GenericRecord
> import org.apache.avro.io.DatumReader
> import org.apache.avro.io.DatumWriter
> import org.apache.avro.Schema
> val parser = new Schema.Parser
> // The name is different but the fields are the same.
> val writerSchema = parser.parse("""{"type": "record", "name": "Writer", 
> "fields": [{"name": "one", "type": "int"}, {"name": "two", "type": 
> "string"}]}""")
> val readerSchema = parser.parse("""{"type": "record", "name": "Reader", 
> "fields": [{"name": "one", "type": "int"}, {"name": "two", "type": 
> "string"}]}""")
> def makeRecord(one: Int, two: String): GenericRecord = {
>   val out = new GenericData.Record(writerSchema)
>   out.put("one", one)
>   out.put("two", two)
>   out
> }
> val datumWriter = new GenericDatumWriter[GenericRecord](writerSchema)
> val dataFileWriter = new DataFileWriter[GenericRecord](datumWriter)
> dataFileWriter.create(writerSchema, new java.io.File("/tmp/test.avro"))
> dataFileWriter.append(makeRecord(1, "one"))
> dataFileWriter.append(makeRecord(2, "two"))
> dataFileWriter.append(makeRecord(3, "three"))
> dataFileWriter.close()
> {code}
> Looking at the output with "hexdump -C /tmp/test.avro", we see that the 
> writer schema is embedded in the file, and the record's name is "Writer".  To 
> read it back:
> {code:java}
> val datumReader = new GenericDatumReader[GenericRecord](writerSchema, 
> readerSchema)
> val dataFileReader = new DataFileReader[GenericRecord](new 
> java.io.File("/tmp/test.avro"), datumReader)
> while (dataFileReader.hasNext) {
>   val in = dataFileReader.next()
>   println(in, in.getSchema)
> }
> {code}
> The problem is that the above is successful, even though I'm requesting a 
> record with name "Reader".
> If I make structurally incompatible records, for instance by writing with 
> "Writer.two" being an integer and "Reader.two" being a string, it fails to 
> read with org.apache.avro.AvroTypeException (as it should).  If I try the 
> above test with an enum type or a fixed type, it _does_ require the writer 
> and reader names to match: record is the only named type for which the name 
> is ignored during schema resolution.
> We're supposed to use aliases to explicitly declare which structurally 
> compatible writer-reader combinations to accept.  Because of the above bug, 
> differently named records are accepted regardless of their aliases, but enums 
> and fixed types are not accepted, even if they have the right aliases.  This 
> may be a separate bug, or it may be related to the above.
> To make sure that I'm correctly understanding the specification, I tried 
> exactly the same thing in the Python version:
> {code:python}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> writerSchema = avro.schema.parse('{"type": "record", "name": "Writer", 
> "fields": [{"name": "one", "type": "int"}, {"name": "two", "type": 
> "string"}]}')
> readerSchema = avro.schema.parse('{"type": "record", "name": "Reader", 
> "fields": [{"name": "one", "type": "int"}, {"name": "two", "type": 
> "string"}]}')
> writer = DataFileWriter(open("/tmp/test2.avro", "w"), DatumWriter(), 
> writerSchema)
> writer.append({"one": 1, "two": "one"})
> writer.append({"one": 2, "two": "two"})
> writer.append({"one": 3, "two": "three"})
> writer.close()
> reader = DataFileReader(open("/tmp/test2.avro"), DatumReader(None, 
> readerSchema))
> for datum in reader:
>     print datum
> {code}
> The Python code fails in the first read with 
> avro.io.SchemaResolutionException, as it is supposed to.  (Interestingly, 
> Python ignores the aliases as well, which I think it's not supposed to do.  
> Since the Java and Python versions both have the same behavior with regard to 
> aliases, I wonder if I'm understanding 
> http://avro.apache.org/docs/1.7.6/spec.html#Aliases correctly.)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to