Hmmmm -- maybe there's a solution that doesn't change the PCF behaviour. In the PCF specification, instead of:
[FULLNAMES] Replace short names with fullnames, using applicable namespaces to do so. Then eliminate namespace attributes, which are now redundant. we change to: [FULLNAMES] Replace short names with fullnames, using applicable namespaces to do so. The only namepace attributes that can remain are namespace="" when a named schema is in the default namespace AND shouldn't inherit a namespace from a parent schema. All other namespace attributes are now redundant, and eliminated. This still leaves the problem of needing fullnames in a UNION, aliases, or as a named reference to a previously defined schema. This would really only pose a problem when there's an ambiguity between Foo, and ns.Foo. Is there a clever way of disambiguating between these that would leave existing fingerprints unchanged? Another alternative might be to have a schema parsing mode that disables namespace inheritance entirely, and to consider that PCF schemas are only appropriately parsed in that mode. There might be a couple of things we can do here to close this loophole without breaking PCF and fingerprints! All my best, Ryan On Fri, Sep 2, 2022 at 5:39 PM Brennan Vincent <[email protected]> wrote: > > > > On 2022-09-02 01:34, Martin Grigorov wrote: > > On Fri, Sep 2, 2022, 02:53 Brennan Vincent <[email protected]> wrote: > > > >> I don’t understand what you mean. I am talking about what to do with names > >> that have no namespace. Obviously, in such a case there are no namespace > >> attributes to remove. > >> > > > > It seems I misunderstood your previous message then. > > > My point was that currently, there is no fullname > corresponding to a name with no namespace. In the future, if > we allow ".Foo", there will be one. Thus, following the > description of PCF which mandates replacing all names by fullnames, > we would replace "Foo" in a non-namespaced context by ".Foo", which > differs from the current behavior of PCF. > > > > > >>> On Sep 1, 2022, at 16:34, Martin Grigorov <[email protected]> wrote: > >>> > >>> > >>> > >>> > >>>> On Thu, Sep 1, 2022 at 11:08 PM Brennan Vincent <[email protected]> > >> wrote: > >>>> > >>>> > >>>> On 2022-08-31 17:18, Martin Grigorov wrote: > >>>>> On Wed, Aug 31, 2022 at 9:59 PM Brennan Vincent < > >> [email protected]> > >>>>> wrote: > >>>>> > >>>>>> > >>>>>> > >>>>>> On 2022-08-31 13:38, Ryan Skraba wrote: > >>>>>>> Hello! I've been trying out some POC code with Java to see what > >> would > >>>>>>> be the impact on that SDK -- in the past, a lot of the development > >> has > >>>>>>> been pretty Java-centric, but this is definitely not a requirement! > >>>>>>> > >>>>>>> Currently, the worst scenario I found is something like: > >>>>>>> > >>>>>>> { "type" : "record", > >>>>>>> "name" : "A", > >>>>>>> "fields" : [ { "name" : "a1", > >>>>>>> "type" : { > >>>>>>> "type" : "record", > >>>>>>> "name" : "B", > >>>>>>> "fields" : [ { "name" : "b1", "type" : [ "null", "A" ], > >>>>>>> "default" : null } ] } } ] } > >>>>>>> > >>>>>>> This is a recursive definition that would like like a linked list > >>>>>>> alternating A records containing B records containing A records, > >> etc. > >>>>>>> > >>>>>>> If you were to only change the name of B to test.B (A fully > >> qualified > >>>>>>> namespace), Java can still parse the schema but the generated code > >>>>>>> unsurprisingly no longer compiles. It correctly finds the outer > >>>>>>> schema (and doesn't try to look for test.A) but it's impossible to > >>>>>>> import into the generated Java code. > >>>>>>> > >>>>>>> If you were to only change the name of A to test A, this is fine. > >>>>>>> > >>>>>>> I was playing around a bit with "auto-mangling" the packages to put > >> A > >>>>>>> in root$.A for this case, but I think it's a hopeless case for Java > >> -- > >>>>>>> there's too many ways for the default package to "sneak" into the > >>>>>>> system from other previously compiled classes, or from IDL, etc. > >>>>>>> > >>>>>>> I think it's still possible to try and accept the .Foo syntax but > >> we'd > >>>>>>> have to note that (for Java) mixing namespaced schemas and > >>>>>>> null-namespaced schemas is either not supported, or we supply a > >>>>>>> mechanism in Java to put ALL unnamespaced generated classes in a > >>>>>>> folder like root$. > >>>>>>> > >>>>>>> Thanks for pointing out part 4, I'm also taking a look at the impact > >>>>>>> there! Given that these mixed namespace schemas are likely to > >> already > >>>>>>> be broken, I don't know if it's too big of an impact! Especially if > >>>>>>> we say that the dot is only added when strictly necessary to prevent > >>>>>>> namespace inheritance. > >>>>>> > >>>>>> There is still a question for non-mixed schemas. > >>>>>> > >>>>>> Consider the following schema: > >>>>>> > >>>>>> { > >>>>>> "type": "fixed", > >>>>>> "name": "Foo", > >>>>>> "size": 10 > >>>>>> } > >>>>>> > >>>>>> Now, if we clarify the spec to say that leading dots are valid in > >>>>>> default-namespace fullnames, then when this is normalized, the > >>>>>> current language of the description of PCF implies that its > >>>>>> > >>>>> > >>>>> Please copy/paste the text from the spec that implies that the name > >> should > >>>>> be ".Foo". > >>>>> Otherwise we will have to guess which sentence you mean exactly. > >>>> > >>>> [FULLNAMES] Replace short names with fullnames, using applicable > >> namespaces > >>>> to do so. Then eliminate namespace attributes, which are now redundant. > >>> > >>> I totally agree that using namespaces everywhere is a best practice! > >>> But eliminating the namespace attribute is not really an option due to > >> backward compatibility. > >>> > >>> > >>>> > >>>>> > >>>>> I don't see any pluses or minuses in using the leading dot in the PCF > >> for > >>>>> top-level names. IMO there is no difference with both representations. > >>>>> For inner names the leading dot should be preserved in the PCF. > >> Otherwise > >>>>> it will start using the enclosing namespace after parsing. > >>>>> > >>>>> > >>>>>> name should be rewritten to ".Foo". However, this is contrary to > >> current > >>>>>> behavior. > >>>>>> > >>>>>> So, if it's okay to change the behavior on existing valid schemas, > >> then > >>>>>> we should do so. If it's not okay, then we should clarify the spec to > >>>>>> say that names are normalized to fullnames for PCF, _except_ > >>>>>> in the special case of the non-default namespace. > >>>>>> > >>>>>>> > >>>>>>> I'll keep digging on the Java side. Anybody else from the other > >> SDKs > >>>>>>> want to weigh in? What would happen with C# generated code? > >>>>>>> > >>>>>>> All my best, Ryan > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Fri, Aug 26, 2022 at 4:10 PM Brennan Vincent < > >> [email protected]> > >>>>>> wrote: > >>>>>>>> > >>>>>>>> I’m in favor of allowing .Foo as a fullname for the following > >> reasons: > >>>>>>>> > >>>>>>>> 1. I believe the *intent* of the initial change to the spec was to > >> only > >>>>>> refer to namespaces; > >>>>>>>> 2. Even if it is not possible in Java to generate code that refers > >> to a > >>>>>> non-namespaced context from a namespaced one, it may be possible in > >> other > >>>>>> languages; > >>>>>>>> 3. We do not lose anything by supporting it. > >>>>>>>> 4. Other parts of the spec assume that all names can be converted > >> to a > >>>>>> fullname, specifically the parsing canonical form algorithm. > >>>>>>>> > >>>>>>>> Point 4. brings me to another issue. Currently, non-namespaced > >> names > >>>>>> are left as bare names in PCF, at least by the Python SDK - they are > >> not > >>>>>> converted to fullnames like .Foo (which makes sense, since that is > >> out of > >>>>>> spec). However, it contradicts the spec: > >>>>>>>> > >>>>>>>> [FULLNAMES] Replace short names with fullnames, using applicable > >>>>>> namespaces to do so. > >>>>>>>> > >>>>>>>> The spec doesn’t say “only if the non-empty namespace is used”. It > >> says > >>>>>> to always do this. So if we enable the ability to write fullnames > >> like > >>>>>> .Foo, we need to decide whether to change the PCF behavior (this will > >>>>>> change the fingerprints of existing schemas) to match the spec, or > >> change > >>>>>> the spec to match the current behavior. > >>>>>>>> > >>>>>>>>> On Aug 26, 2022, at 03:57, Ryan Skraba <[email protected]> wrote: > >>>>>>>>> > >>>>>>>>> Hello! We can just discuss the impact here in the mailing list > >> and > >>>>>>>>> make a decision by consensus. Sometimes for major changes, we do > >> a > >>>>>>>>> more formal VOTE thread -- this might be one of those cases. > >>>>>>>>> > >>>>>>>>> What would happen if we were to say that ".MyRecord" was valid in > >> the > >>>>>>>>> next major version of Avro? > >>>>>>>>> > >>>>>>>>> Some SDKs used to accept this in the past and were made more > >> strict, > >>>>>>>>> causing working examples to break? That is really unfortunate. > >>>>>>>>> > >>>>>>>>> On the other hand, if we generate Java code today and map > >> packages 1:1 > >>>>>>>>> to namespaces... we still won't be able to mix namespaced (in a > >>>>>>>>> package) and unnamespaced (unpackaged) generated code. Would we > >> just > >>>>>>>>> mangle the default namespace to "default$" or ... ? A > >> configuration > >>>>>>>>> option for the SpecificCompiler in Java? > >>>>>>>>> > >>>>>>>>> Either way, it would be great if we didn't leave this point vague > >> in > >>>>>>>>> the spec! There's always the possibility to allow language SDKs > >> to > >>>>>>>>> deviate from the spec -- if e.g. python or Java has a > >>>>>>>>> "setValidateUnqualifiedNamespace(boolean)" method, we can leave > >> it up > >>>>>>>>> to the user whether or not to follow the strict spec. We already > >> do > >>>>>>>>> this with validating defaults in Java, for example. > >>>>>>>>> > >>>>>>>>> It might take a bit of thought, but if we can find some elegant > >> way to > >>>>>>>>> make this work I don't see why we wouldn't make specification > >> changes! > >>>>>>>>> > >>>>>>>>> Ryan > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> On Thu, Aug 25, 2022 at 7:31 PM Brennan Vincent < > >>>>>> [email protected]> wrote: > >>>>>>>>>> > >>>>>>>>>> That is a fair point also. > >>>>>>>>>> > >>>>>>>>>> Anyway, since I'm not an Apache project member, I'm not quite > >> sure > >>>>>> what > >>>>>>>>>> is the best way to move forward here. Is there a formal process > >> for > >>>>>> proposing > >>>>>>>>>> changes to the spec and reaching a consensus? > >>>>>>>>>> > >>>>>>>>>> Thanks > >>>>>>>>>> Brennan > >>>>>>>>>> > >>>>>>>>>>> On 2022-08-25 01:36, Oscar Westra van Holthe - Kind wrote: > >>>>>>>>>>> Hi all, > >>>>>>>>>>> > >>>>>>>>>>> Allowing references to the null namespace from within another > >>>>>> namespace > >>>>>>>>>>> gives schema authors more options. > >>>>>>>>>>> > >>>>>>>>>>> But if you're using namespaces at all, there must be a reason > >> for > >>>>>> it. As a > >>>>>>>>>>> schema author, you've made the decision to group your schemata. > >>>>>>>>>>> > >>>>>>>>>>> To make this decision from schema authors more visible, I'd opt > >> to > >>>>>> choose > >>>>>>>>>>> the Java route and in that case force all schemata to belong to > >> a > >>>>>> group. > >>>>>>>>>>> I.e., explicitly disallow identifiers to start with a dot (and > >>>>>> disallow > >>>>>>>>>>> references to the null namespace from within another namespace). > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Kind regards, > >>>>>>>>>>> Oscar > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Oscar Westra van Holthe - Kind <[email protected]> > >>>>>>>>>>> > >>>>>>>>>>> Op wo 24 aug. 2022 14:42 schreef Ryan Skraba <[email protected]>: > >>>>>>>>>>> > >>>>>>>>>>>> Hello! There is definitely an ambiguity here caused by > >> inheriting > >>>>>>>>>>>> namespaces. > >>>>>>>>>>>> > >>>>>>>>>>>> The obvious takeaway is to use a namespace with all of your > >> named > >>>>>>>>>>>> schemas. As a best practice, that avoids the problem of mixing > >>>>>>>>>>>> schemas with and without namespaces, and it's probably this > >> techniq > >>>>>>>>>>>> > >>>>>>>>>>>> This same problem occurs in Java classes, where you can have a > >> class > >>>>>>>>>>>> in the default package (without a package name), but it's an > >> error > >>>>>> to > >>>>>>>>>>>> import it into other packages. > >>>>>>>>>>>> > >>>>>>>>>>>> The ".MyRecord" notation might be the right way to clarify > >> this, but > >>>>>>>>>>>> we can also go the Java route (i.e. you can't mix namespaced > >> schema > >>>>>>>>>>>> and non-namespaced schemas). What do you think? > >>>>>>>>>>>> > >>>>>>>>>>>> Best regards, Ryan > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Mon, Aug 22, 2022 at 10:49 PM Brennan Vincent < > >>>>>> [email protected]> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 2022/08/22 20:05:22 Martin Grigorov wrote: > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I might be wrong but I think your sample schema should be > >> valid! > >>>>>> Does > >>>>>>>>>>>> it > >>>>>>>>>>>>>> fail with any of the SDKs ? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Yes. It fails with the Python avro package. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This part of the spec talks about the namespace, not the > >> type. > >>>>>> I.e. > >>>>>>>>>>>>>> "namespace": ".ns" would be an error. > >>>>>>>>>>>>> > >>>>>>>>>>>>> The linked thread ( > >>>>>>>>>>>> > >> https://lists.apache.org/thread/q0o58fxgvstvdlgpoyv2pcz53borp587 ) > >>>>>>>>>>>>> is a bit vague -- it's not totally clear whether the > >> restriction is > >>>>>>>>>>>> meant to apply to > >>>>>>>>>>>>> namespaces only, or to fullnames also. > >>>>>>>>>>>>> > >>>>>>>>>>>>> "The null namespace may not be used in a dot-separated > >> sequence of > >>>>>>>>>>>> names." > >>>>>>>>>>>>> > >>>>>>>>>>>>> certainly makes it sound like it applies to _any_ sequence of > >>>>>> names, > >>>>>>>>>>>> though, > >>>>>>>>>>>>> not just in a namespace field. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Mon, Aug 22, 2022 at 10:40 PM Brennan Vincent < > >>>>>> [email protected] > >>>>>>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> https://github.com/apache/avro/pull/917 introduced the > >> following > >>>>>>>>>>>> language > >>>>>>>>>>>>>>> to the spec: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The null namespace may not be used in a dot-separated > >> sequence > >>>>>> of > >>>>>>>>>>>> names. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thus ruling out fullnames like ".foo". > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> However, this seems to rule out referring to names in the > >> default > >>>>>>>>>>>>>>> namespace from another namespace. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> For example, this schema was previously allowed by the spec: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>> "type": "record", > >>>>>>>>>>>>>>> "name": "r", > >>>>>>>>>>>>>>> "fields": [ > >>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>> "name": "f", > >>>>>>>>>>>>>>> "type": { > >>>>>>>>>>>>>>> "type": "record", > >>>>>>>>>>>>>>> "name": "r2", > >>>>>>>>>>>>>>> "namespace": "ns", > >>>>>>>>>>>>>>> "fields": [ > >>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>> "name": "f2", > >>>>>>>>>>>>>>> "type": ["null", ".r"] > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> ] > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> ] > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Note ".r" in the type of "f2". This can't be changed to "r", > >>>>>>>>>>>>>>> because that would be interpreted as "ns.r" due to "ns" > >> being the > >>>>>>>>>>>> nearest > >>>>>>>>>>>>>>> enclosing namespace. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thus it seems that the new spec has restricted the set of > >> valid > >>>>>>>>>>>> schemas > >>>>>>>>>>>>>>> and there is no longer > >>>>>>>>>>>>>>> any way to accomplish this. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Am I misinterpreting the spec? Does the empty namespace > >> being > >>>>>>>>>>>> disallowed > >>>>>>>>>>>>>>> in dotted sequences > >>>>>>>>>>>>>>> of names only apply to initial name definitions, but not to > >> later > >>>>>>>>>>>> name > >>>>>>>>>>>>>>> references? Or is there > >>>>>>>>>>>>>>> some other way to express this? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Here is the initial discussion of this change, where the > >> issue > >>>>>> I'm > >>>>>>>>>>>> raising > >>>>>>>>>>>>>>> here doesn't > >>>>>>>>>>>>>>> appear to have come up: > >>>>>>>>>>>>>>> > >> https://lists.apache.org/thread/q0o58fxgvstvdlgpoyv2pcz53borp587 > >>>>>>>>>>>>>>> > >>
