also - could we have this discussion on the JIRA ticket itself? its much easier (for me?) to find that way
On Fri, May 13, 2022 at 9:10 AM r <[email protected]> wrote: > Hi, and sorry for the late response. > > aliases are not for "import" reasons. actually the avsc spec (unlike the > more modern IDL) doesnt have any notion of imports. aliases (in my > understanding) are about > schema evolution and compatibility (for I/O operations - > encoding/decoding, NOT for the compatibility of generated code?). > also avro supports things that various impl languages dont (java example - > you can have avro schema fields with names that are reserved keywords in > java and they will be "mangled" in java generated code). > > as for the need for this feature - it would be impossible to evolve a > schema out of the null namespace without this, as newer versions of the > schema (with namespace) acting as reader schemas would not be able to > decode data written using older versions of the schema (without namespace). > > as such I think this should be made a feature. without it users who need > to enable such schema evolution (like me) would have to write code to > "doctor" the writer schema to match the reader schema ... > > On Fri, May 13, 2022 at 5:56 AM Oscar Westra van Holthe - Kind < > [email protected]> wrote: > >> Hi, >> >> It's true that the spec doesn't describe how aliases would target the null >> namespace. But on the other hand, I would not expect this to be allowed at >> all: >> >> - In Java, it's an explicit compiler error to import from the unnamed >> package >> - In Python, import statements must be able to address whatever you're >> importing: importing from something unnamed is not possible >> - Other languages I've seen also don't support unnamed namespaces >> - Scala is an exception, in that imports are always relative, and that >> caused it to support the _root_ package >> >> Given how the spec describes full names as "a dot-separated sequence of >> [simple] names", I'd say addressing the null namespace is not supported. >> >> I'm not against such a feature though, but we should explicitly document >> how aliases (and full names) could contain the null namespace. >> >> >> Kind regards, >> Oscar >> >> >> On Sat, 7 May 2022 at 16:16, Radai Rosenblatt (Jira) <[email protected]> >> wrote: >> >> > >> > [ >> > >> https://issues.apache.org/jira/browse/AVRO-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >> > ] >> > >> > Radai Rosenblatt updated AVRO-3512: >> > ----------------------------------- >> > Description: >> > the avro spec allows for the "null namespace" (when no namespace is >> > specified anywhere). it also has [the following| >> > https://avro.apache.org/docs/current/spec.html#Aliases] to say about >> > aliases: >> > {quote}if a type named "a.b" has aliases of "c" and "x.y", then the >> fully >> > qualified names of its aliases are "a.c" and "x.y" >> > {quote} >> > which means a "simple" alias ("c" above) inherits any namespace defined >> on >> > the declaring type. >> > >> > >> > >> > now suppose i was to use aliases on a namespaced schema to be able to >> read >> > data written using a schema that is in the null namespace (has no >> > namespace). >> > >> > here are my writer schema: >> > {code:json} >> > { >> > "type": "record", >> > "name": "AncientSchema", >> > "fields": [ >> > { >> > "name" : "enumField", >> > "type" : { >> > "type" : "enum", >> > "name" : "AncientEnum", >> > "symbols" : [ "THE", "SPEC", "IS", "A", "LIE" ] >> > } >> > } >> > ] >> > } >> > {code} >> > and reader schema: >> > {code:json} >> > { >> > "type": "record", >> > "namespace": "much.namespace", >> > "name": "ModernRecord", >> > "fields": [ >> > { >> > "name" : "enumField", >> > "type" : { >> > "type" : "enum", >> > "name" : "ModernEnum", >> > "symbols" : [ "THE", "SPEC", "IS", "A", "LIE" ], >> > "aliases": [ >> > ".AncientEnum" >> > ] >> > } >> > ], >> > "aliases": [ >> > ".AncientSchema" >> > ] >> > } >> > {code} >> > notice the dots used in the aliases. as far as i understand the spec >> this >> > should be the only legal way to do this. and it does indeed work .... >> to a >> > point. >> > >> > >> > >> > when testing this i found multiple issues with avro's handling of such >> > aliases, dating back to late avro 1.7.* >> > >> > >> > # without these aliases, decoding does fail, but it fails over the >> nested >> > enum, whereas it should have failed "immediately" on the fullname >> mismatch >> > on the top level record schema. in fact, on further testing i think avro >> > (at least in java) doesnt bother comparing the fullnames on the top >> level >> > writer vs reader schemas at all? >> > # while the schema with the aliases parse()es fine, Schema.toString() >> > strips out the dots from the aliases, thereby creating a "monsanto >> > terminator schema" - once printed and parsed again the aliases would >> become >> > "simple aliases" and stop working >> > # the spec doesnt explicitly talk about how to use aliases to "target" >> > the null namespace. if this is an intentional feature I think the spec >> > should be expanded a little to cover it? >> > >> > >> > >> > i have code to reproduce all these issues in [ >> > >> https://github.com/radai-rosenblatt/avro/blob/aliasing-to-null-namespace/lang/java/avro/src/test/java/org/apache/avro/TestAliasToNullNamespace.java >> ] >> > (coded against master) >> > >> > >> > >> > i also have code to reproduce all the above against multiple older avro >> > versions in [ >> > >> https://github.com/linkedin/avro-util/blob/master/helper/tests/helper-tests-allavro/src/test/java/com/linkedin/avroutil1/compatibility/AvroTypeAliasesTest.java >> > ] >> > >> > was: >> > the avro spec allows for the "null namespace" (when no namespace is >> > specified anywhere). it also has [the following| >> > https://avro.apache.org/docs/current/spec.html#Aliases] to say about >> > aliases: >> > {quote}if a type named "a.b" has aliases of "c" and "x.y", then the >> fully >> > qualified names of its aliases are "a.c" and "x.y" >> > {quote} >> > which means a "simple" alias ("c" above) inherits any namespace defined >> on >> > the declaring type. >> > >> > >> > >> > now suppose i was to use aliases on a namespaced schema to be able to >> read >> > data written using a schema that is in the null namespace (has no >> > namespace). >> > >> > here are my writer schema: >> > {code:json} >> > { >> > "type": "record", >> > "name": "AncientSchema", >> > "fields": [ >> > { >> > "name" : "enumField", >> > "type" : { >> > "type" : "enum", >> > "name" : "AncientEnum", >> > "symbols" : [ "THE", "SPEC", "IS", "A", "LIE" ] >> > } >> > } >> > ] >> > } >> > {code} >> > and reader schema: >> > {code:json} >> > { >> > "type": "record", >> > "namespace": "much.namespace", >> > "name": "ModernRecord", >> > "fields": [ >> > { >> > "name" : "enumField", >> > "type" : { >> > "type" : "enum", >> > "name" : "ModernEnum", >> > "symbols" : [ "THE", "SPEC", "IS", "A", "LIE" ], >> > "aliases": [ >> > ".AncientEnum" >> > ] >> > } >> > ], >> > "aliases": [ >> > ".AncientSchema" >> > ] >> > } >> > {code} >> > notice the dots used in the aliases. as far as i understand the spec >> this >> > should be the only legal way to do this. and it does indeed work .... >> to a >> > point. >> > >> > >> > >> > when testing this i found multiple issues with avro's handling of such >> > aliases, dating back to late avro 1.7.* >> > >> > >> > # without these aliases, decoding does fail, but it fails over the >> nested >> > enum, whereas it should have failed "immediately" on the fullname >> mismatch >> > on the top level record schema. in fact, on further testing i think avro >> > (at least in java) doesnt bother comparing the fullnames on the top >> level >> > writer vs reader schemas at all? >> > # while the schema with the aliases parse()es fine, Schema.toString() >> > strips out the dots from the aliases, thereby creating a "monsanto >> > terminator schema" - once printed and parsed again the aliases would >> become >> > "simple aliases" and stop working >> > # the spec doesnt explicitly talk about how to use aliases to "target" >> > the null namespace. if this is an intentional specification I think the >> > spec should be expanded a little to cover it? >> > >> > >> > >> > i have code to reproduce all these issues in [ >> > >> https://github.com/radai-rosenblatt/avro/blob/aliasing-to-null-namespace/lang/java/avro/src/test/java/org/apache/avro/TestAliasToNullNamespace.java >> ] >> > (coded against master) >> > >> > >> > >> > i also have code to reproduce all the above against multiple older avro >> > versions in [ >> > >> https://github.com/linkedin/avro-util/blob/master/helper/tests/helper-tests-allavro/src/test/java/com/linkedin/avroutil1/compatibility/AvroTypeAliasesTest.java >> > ] >> > >> > >> > > aliases to the null namespace do not work as expected >> > > ----------------------------------------------------- >> > > >> > > Key: AVRO-3512 >> > > URL: https://issues.apache.org/jira/browse/AVRO-3512 >> > > Project: Apache Avro >> > > Issue Type: Bug >> > > Components: java, spec >> > > Affects Versions: 1.11.0 >> > > Reporter: Radai Rosenblatt >> > > Priority: Major >> > > >> > > the avro spec allows for the "null namespace" (when no namespace is >> > specified anywhere). it also has [the following| >> > https://avro.apache.org/docs/current/spec.html#Aliases] to say about >> > aliases: >> > > {quote}if a type named "a.b" has aliases of "c" and "x.y", then the >> > fully qualified names of its aliases are "a.c" and "x.y" >> > > {quote} >> > > which means a "simple" alias ("c" above) inherits any namespace >> defined >> > on the declaring type. >> > > >> > > now suppose i was to use aliases on a namespaced schema to be able to >> > read data written using a schema that is in the null namespace (has no >> > namespace). >> > > here are my writer schema: >> > > {code:json} >> > > { >> > > "type": "record", >> > > "name": "AncientSchema", >> > > "fields": [ >> > > { >> > > "name" : "enumField", >> > > "type" : { >> > > "type" : "enum", >> > > "name" : "AncientEnum", >> > > "symbols" : [ "THE", "SPEC", "IS", "A", "LIE" ] >> > > } >> > > } >> > > ] >> > > } >> > > {code} >> > > and reader schema: >> > > {code:json} >> > > { >> > > "type": "record", >> > > "namespace": "much.namespace", >> > > "name": "ModernRecord", >> > > "fields": [ >> > > { >> > > "name" : "enumField", >> > > "type" : { >> > > "type" : "enum", >> > > "name" : "ModernEnum", >> > > "symbols" : [ "THE", "SPEC", "IS", "A", "LIE" ], >> > > "aliases": [ >> > > ".AncientEnum" >> > > ] >> > > } >> > > ], >> > > "aliases": [ >> > > ".AncientSchema" >> > > ] >> > > } >> > > {code} >> > > notice the dots used in the aliases. as far as i understand the spec >> > this should be the only legal way to do this. and it does indeed work >> .... >> > to a point. >> > > >> > > when testing this i found multiple issues with avro's handling of such >> > aliases, dating back to late avro 1.7.* >> > > >> > > # without these aliases, decoding does fail, but it fails over the >> > nested enum, whereas it should have failed "immediately" on the fullname >> > mismatch on the top level record schema. in fact, on further testing i >> > think avro (at least in java) doesnt bother comparing the fullnames on >> the >> > top level writer vs reader schemas at all? >> > > # while the schema with the aliases parse()es fine, Schema.toString() >> > strips out the dots from the aliases, thereby creating a "monsanto >> > terminator schema" - once printed and parsed again the aliases would >> become >> > "simple aliases" and stop working >> > > # the spec doesnt explicitly talk about how to use aliases to >> "target" >> > the null namespace. if this is an intentional feature I think the spec >> > should be expanded a little to cover it? >> > > >> > > i have code to reproduce all these issues in [ >> > >> https://github.com/radai-rosenblatt/avro/blob/aliasing-to-null-namespace/lang/java/avro/src/test/java/org/apache/avro/TestAliasToNullNamespace.java >> ] >> > (coded against master) >> > > >> > > i also have code to reproduce all the above against multiple older >> avro >> > versions in [ >> > >> https://github.com/linkedin/avro-util/blob/master/helper/tests/helper-tests-allavro/src/test/java/com/linkedin/avroutil1/compatibility/AvroTypeAliasesTest.java >> > ] >> > >> > >> > >> > -- >> > This message was sent by Atlassian Jira >> > (v8.20.7#820007) >> > >> >> >> -- >> >> ✉️ Oscar Westra van Holthe - Kind <[email protected]> >> >
