Re: [jira] [Updated] (AVRO-3512) aliases to the null namespace do not work as expected

Oscar Westra van Holthe - Kind Fri, 13 May 2022 05:56:45 -0700

Hi,

It's true that the spec doesn't describe how aliases would target the null
namespace. But on the other hand, I would not expect this to be allowed at
all:


   - In Java, it's an explicit compiler error to import from the unnamed
   package
   - In Python, import statements must be able to address whatever you're
   importing: importing from something unnamed is not possible
   - Other languages I've seen also don't support unnamed namespaces
   - Scala is an exception, in that imports are always relative, and that
   caused it to support the _root_ package

Given how the spec describes full names as "a dot-separated sequence of
[simple] names", I'd say addressing the null namespace is not supported.

I'm not against such a feature though, but we should explicitly document
how aliases (and full names) could contain the null namespace.


Kind regards,
Oscar


On Sat, 7 May 2022 at 16:16, Radai Rosenblatt (Jira) <[email protected]>
wrote:

>
>      [
> https://issues.apache.org/jira/browse/AVRO-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
>
> Radai Rosenblatt updated AVRO-3512:
> -----------------------------------
>     Description:
> the avro spec allows for the "null namespace" (when no namespace is
> specified anywhere). it also has [the following|
> https://avro.apache.org/docs/current/spec.html#Aliases] to say about
> aliases:
> {quote}if a type named "a.b" has aliases of "c" and "x.y", then the fully
> qualified names of its aliases are "a.c" and "x.y"
> {quote}
> which means a "simple" alias ("c" above) inherits any namespace defined on
> the declaring type.
>
>
>
> now suppose i was to use aliases on a namespaced schema to be able to read
> data written using a schema that is in the null namespace (has no
> namespace).
>
> here are my writer schema:
> {code:json}
> {
>   "type": "record",
>   "name": "AncientSchema",
>   "fields": [
>     {
>       "name" : "enumField",
>       "type" : {
>         "type" : "enum",
>         "name" : "AncientEnum",
>         "symbols" : [ "THE", "SPEC", "IS", "A", "LIE" ]
>       }
>     }
>   ]
> }
> {code}
> and reader schema:
> {code:json}
> {
>   "type": "record",
>   "namespace": "much.namespace",
>   "name": "ModernRecord",
>   "fields": [
>     {
>       "name" : "enumField",
>       "type" : {
>         "type" : "enum",
>         "name" : "ModernEnum",
>         "symbols" : [ "THE", "SPEC", "IS", "A", "LIE" ],
>         "aliases": [
>            ".AncientEnum"
>         ]
>       }
>   ],
>   "aliases": [
>     ".AncientSchema"
>   ]
> }
> {code}
> notice the dots used in the aliases. as far as i understand the spec this
> should be the only legal way to do this. and it does indeed work .... to a
> point.
>
>
>
> when testing this i found multiple issues with avro's handling of such
> aliases, dating back to late avro 1.7.*
>
>
>  # without these aliases, decoding does fail, but it fails over the nested
> enum, whereas it should have failed "immediately" on the fullname mismatch
> on the top level record schema. in fact, on further testing i think avro
> (at least in java) doesnt bother comparing the fullnames on the top level
> writer vs reader schemas at all?
>  # while the schema with the aliases parse()es fine, Schema.toString()
> strips out the dots from the aliases, thereby creating a "monsanto
> terminator schema" - once printed and parsed again the aliases would become
> "simple aliases" and stop working
>  # the spec doesnt explicitly talk about how to use aliases to "target"
> the null namespace. if this is an intentional feature I think the spec
> should be expanded a little to cover it?
>
>
>
> i have code to reproduce all these issues in [
> https://github.com/radai-rosenblatt/avro/blob/aliasing-to-null-namespace/lang/java/avro/src/test/java/org/apache/avro/TestAliasToNullNamespace.java]
> (coded against master)
>
>
>
> i also have code to reproduce all the above against multiple older avro
> versions in [
> https://github.com/linkedin/avro-util/blob/master/helper/tests/helper-tests-allavro/src/test/java/com/linkedin/avroutil1/compatibility/AvroTypeAliasesTest.java
> ]
>
>   was:
> the avro spec allows for the "null namespace" (when no namespace is
> specified anywhere). it also has [the following|
> https://avro.apache.org/docs/current/spec.html#Aliases] to say about
> aliases:
> {quote}if a type named "a.b" has aliases of "c" and "x.y", then the fully
> qualified names of its aliases are "a.c" and "x.y"
> {quote}
> which means a "simple" alias ("c" above) inherits any namespace defined on
> the declaring type.
>
>
>
> now suppose i was to use aliases on a namespaced schema to be able to read
> data written using a schema that is in the null namespace (has no
> namespace).
>
> here are my writer schema:
> {code:json}
> {
>   "type": "record",
>   "name": "AncientSchema",
>   "fields": [
>     {
>       "name" : "enumField",
>       "type" : {
>         "type" : "enum",
>         "name" : "AncientEnum",
>         "symbols" : [ "THE", "SPEC", "IS", "A", "LIE" ]
>       }
>     }
>   ]
> }
> {code}
> and reader schema:
> {code:json}
> {
>   "type": "record",
>   "namespace": "much.namespace",
>   "name": "ModernRecord",
>   "fields": [
>     {
>       "name" : "enumField",
>       "type" : {
>         "type" : "enum",
>         "name" : "ModernEnum",
>         "symbols" : [ "THE", "SPEC", "IS", "A", "LIE" ],
>         "aliases": [
>            ".AncientEnum"
>         ]
>       }
>   ],
>   "aliases": [
>     ".AncientSchema"
>   ]
> }
> {code}
> notice the dots used in the aliases. as far as i understand the spec this
> should be the only legal way to do this. and it does indeed work .... to a
> point.
>
>
>
> when testing this i found multiple issues with avro's handling of such
> aliases, dating back to late avro 1.7.*
>
>
>  # without these aliases, decoding does fail, but it fails over the nested
> enum, whereas it should have failed "immediately" on the fullname mismatch
> on the top level record schema. in fact, on further testing i think avro
> (at least in java) doesnt bother comparing the fullnames on the top level
> writer vs reader schemas at all?
>  # while the schema with the aliases parse()es fine, Schema.toString()
> strips out the dots from the aliases, thereby creating a "monsanto
> terminator schema" - once printed and parsed again the aliases would become
> "simple aliases" and stop working
>  # the spec doesnt explicitly talk about how to use aliases to "target"
> the null namespace. if this is an intentional specification I think the
> spec should be expanded a little to cover it?
>
>
>
> i have code to reproduce all these issues in [
> https://github.com/radai-rosenblatt/avro/blob/aliasing-to-null-namespace/lang/java/avro/src/test/java/org/apache/avro/TestAliasToNullNamespace.java]
> (coded against master)
>
>
>
> i also have code to reproduce all the above against multiple older avro
> versions in [
> https://github.com/linkedin/avro-util/blob/master/helper/tests/helper-tests-allavro/src/test/java/com/linkedin/avroutil1/compatibility/AvroTypeAliasesTest.java
> ]
>
>
> > aliases to the null namespace do not work as expected
> > -----------------------------------------------------
> >
> >                 Key: AVRO-3512
> >                 URL: https://issues.apache.org/jira/browse/AVRO-3512
> >             Project: Apache Avro
> >          Issue Type: Bug
> >          Components: java, spec
> >    Affects Versions: 1.11.0
> >            Reporter: Radai Rosenblatt
> >            Priority: Major
> >
> > the avro spec allows for the "null namespace" (when no namespace is
> specified anywhere). it also has [the following|
> https://avro.apache.org/docs/current/spec.html#Aliases] to say about
> aliases:
> > {quote}if a type named "a.b" has aliases of "c" and "x.y", then the
> fully qualified names of its aliases are "a.c" and "x.y"
> > {quote}
> > which means a "simple" alias ("c" above) inherits any namespace defined
> on the declaring type.
> >
> > now suppose i was to use aliases on a namespaced schema to be able to
> read data written using a schema that is in the null namespace (has no
> namespace).
> > here are my writer schema:
> > {code:json}
> > {
> >   "type": "record",
> >   "name": "AncientSchema",
> >   "fields": [
> >     {
> >       "name" : "enumField",
> >       "type" : {
> >         "type" : "enum",
> >         "name" : "AncientEnum",
> >         "symbols" : [ "THE", "SPEC", "IS", "A", "LIE" ]
> >       }
> >     }
> >   ]
> > }
> > {code}
> > and reader schema:
> > {code:json}
> > {
> >   "type": "record",
> >   "namespace": "much.namespace",
> >   "name": "ModernRecord",
> >   "fields": [
> >     {
> >       "name" : "enumField",
> >       "type" : {
> >         "type" : "enum",
> >         "name" : "ModernEnum",
> >         "symbols" : [ "THE", "SPEC", "IS", "A", "LIE" ],
> >         "aliases": [
> >            ".AncientEnum"
> >         ]
> >       }
> >   ],
> >   "aliases": [
> >     ".AncientSchema"
> >   ]
> > }
> > {code}
> > notice the dots used in the aliases. as far as i understand the spec
> this should be the only legal way to do this. and it does indeed work ....
> to a point.
> >
> > when testing this i found multiple issues with avro's handling of such
> aliases, dating back to late avro 1.7.*
> >
> >  # without these aliases, decoding does fail, but it fails over the
> nested enum, whereas it should have failed "immediately" on the fullname
> mismatch on the top level record schema. in fact, on further testing i
> think avro (at least in java) doesnt bother comparing the fullnames on the
> top level writer vs reader schemas at all?
> >  # while the schema with the aliases parse()es fine, Schema.toString()
> strips out the dots from the aliases, thereby creating a "monsanto
> terminator schema" - once printed and parsed again the aliases would become
> "simple aliases" and stop working
> >  # the spec doesnt explicitly talk about how to use aliases to "target"
> the null namespace. if this is an intentional feature I think the spec
> should be expanded a little to cover it?
> >
> > i have code to reproduce all these issues in [
> https://github.com/radai-rosenblatt/avro/blob/aliasing-to-null-namespace/lang/java/avro/src/test/java/org/apache/avro/TestAliasToNullNamespace.java]
> (coded against master)
> >
> > i also have code to reproduce all the above against multiple older avro
> versions in [
> https://github.com/linkedin/avro-util/blob/master/helper/tests/helper-tests-allavro/src/test/java/com/linkedin/avroutil1/compatibility/AvroTypeAliasesTest.java
> ]
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.20.7#820007)
>


-- 

✉️ Oscar Westra van Holthe - Kind <[email protected]>

Re: [jira] [Updated] (AVRO-3512) aliases to the null namespace do not work as expected

Reply via email to