On 2022-08-31 13:38, Ryan Skraba wrote:
> Hello! I've been trying out some POC code with Java to see what would
> be the impact on that SDK -- in the past, a lot of the development has
> been pretty Java-centric, but this is definitely not a requirement!
>
> Currently, the worst scenario I found is something like:
>
> { "type" : "record",
> "name" : "A",
> "fields" : [ { "name" : "a1",
> "type" : {
> "type" : "record",
> "name" : "B",
> "fields" : [ { "name" : "b1", "type" : [ "null", "A" ],
> "default" : null } ] } } ] }
>
> This is a recursive definition that would like like a linked list
> alternating A records containing B records containing A records, etc.
>
> If you were to only change the name of B to test.B (A fully qualified
> namespace), Java can still parse the schema but the generated code
> unsurprisingly no longer compiles. It correctly finds the outer
> schema (and doesn't try to look for test.A) but it's impossible to
> import into the generated Java code.
>
> If you were to only change the name of A to test A, this is fine.
>
> I was playing around a bit with "auto-mangling" the packages to put A
> in root$.A for this case, but I think it's a hopeless case for Java --
> there's too many ways for the default package to "sneak" into the
> system from other previously compiled classes, or from IDL, etc.
>
> I think it's still possible to try and accept the .Foo syntax but we'd
> have to note that (for Java) mixing namespaced schemas and
> null-namespaced schemas is either not supported, or we supply a
> mechanism in Java to put ALL unnamespaced generated classes in a
> folder like root$.
>
> Thanks for pointing out part 4, I'm also taking a look at the impact
> there! Given that these mixed namespace schemas are likely to already
> be broken, I don't know if it's too big of an impact! Especially if
> we say that the dot is only added when strictly necessary to prevent
> namespace inheritance.
There is still a question for non-mixed schemas.
Consider the following schema:
{
"type": "fixed",
"name": "Foo",
"size": 10
}
Now, if we clarify the spec to say that leading dots are valid in
default-namespace fullnames, then when this is normalized, the
current language of the description of PCF implies that its
name should be rewritten to ".Foo". However, this is contrary to current
behavior.
So, if it's okay to change the behavior on existing valid schemas, then
we should do so. If it's not okay, then we should clarify the spec to
say that names are normalized to fullnames for PCF, _except_
in the special case of the non-default namespace.
>
> I'll keep digging on the Java side. Anybody else from the other SDKs
> want to weigh in? What would happen with C# generated code?
>
> All my best, Ryan
>
>
>
> On Fri, Aug 26, 2022 at 4:10 PM Brennan Vincent <[email protected]>
> wrote:
>>
>> I’m in favor of allowing .Foo as a fullname for the following reasons:
>>
>> 1. I believe the *intent* of the initial change to the spec was to only
>> refer to namespaces;
>> 2. Even if it is not possible in Java to generate code that refers to a
>> non-namespaced context from a namespaced one, it may be possible in other
>> languages;
>> 3. We do not lose anything by supporting it.
>> 4. Other parts of the spec assume that all names can be converted to a
>> fullname, specifically the parsing canonical form algorithm.
>>
>> Point 4. brings me to another issue. Currently, non-namespaced names are
>> left as bare names in PCF, at least by the Python SDK - they are not
>> converted to fullnames like .Foo (which makes sense, since that is out of
>> spec). However, it contradicts the spec:
>>
>> [FULLNAMES] Replace short names with fullnames, using applicable namespaces
>> to do so.
>>
>> The spec doesn’t say “only if the non-empty namespace is used”. It says to
>> always do this. So if we enable the ability to write fullnames like .Foo, we
>> need to decide whether to change the PCF behavior (this will change the
>> fingerprints of existing schemas) to match the spec, or change the spec to
>> match the current behavior.
>>
>>> On Aug 26, 2022, at 03:57, Ryan Skraba <[email protected]> wrote:
>>>
>>> Hello! We can just discuss the impact here in the mailing list and
>>> make a decision by consensus. Sometimes for major changes, we do a
>>> more formal VOTE thread -- this might be one of those cases.
>>>
>>> What would happen if we were to say that ".MyRecord" was valid in the
>>> next major version of Avro?
>>>
>>> Some SDKs used to accept this in the past and were made more strict,
>>> causing working examples to break? That is really unfortunate.
>>>
>>> On the other hand, if we generate Java code today and map packages 1:1
>>> to namespaces... we still won't be able to mix namespaced (in a
>>> package) and unnamespaced (unpackaged) generated code. Would we just
>>> mangle the default namespace to "default$" or ... ? A configuration
>>> option for the SpecificCompiler in Java?
>>>
>>> Either way, it would be great if we didn't leave this point vague in
>>> the spec! There's always the possibility to allow language SDKs to
>>> deviate from the spec -- if e.g. python or Java has a
>>> "setValidateUnqualifiedNamespace(boolean)" method, we can leave it up
>>> to the user whether or not to follow the strict spec. We already do
>>> this with validating defaults in Java, for example.
>>>
>>> It might take a bit of thought, but if we can find some elegant way to
>>> make this work I don't see why we wouldn't make specification changes!
>>>
>>> Ryan
>>>
>>>
>>>
>>>
>>>> On Thu, Aug 25, 2022 at 7:31 PM Brennan Vincent <[email protected]>
>>>> wrote:
>>>>
>>>> That is a fair point also.
>>>>
>>>> Anyway, since I'm not an Apache project member, I'm not quite sure what
>>>> is the best way to move forward here. Is there a formal process for
>>>> proposing
>>>> changes to the spec and reaching a consensus?
>>>>
>>>> Thanks
>>>> Brennan
>>>>
>>>>> On 2022-08-25 01:36, Oscar Westra van Holthe - Kind wrote:
>>>>> Hi all,
>>>>>
>>>>> Allowing references to the null namespace from within another namespace
>>>>> gives schema authors more options.
>>>>>
>>>>> But if you're using namespaces at all, there must be a reason for it. As a
>>>>> schema author, you've made the decision to group your schemata.
>>>>>
>>>>> To make this decision from schema authors more visible, I'd opt to choose
>>>>> the Java route and in that case force all schemata to belong to a group.
>>>>> I.e., explicitly disallow identifiers to start with a dot (and disallow
>>>>> references to the null namespace from within another namespace).
>>>>>
>>>>>
>>>>> Kind regards,
>>>>> Oscar
>>>>>
>>>>> --
>>>>> Oscar Westra van Holthe - Kind <[email protected]>
>>>>>
>>>>> Op wo 24 aug. 2022 14:42 schreef Ryan Skraba <[email protected]>:
>>>>>
>>>>>> Hello! There is definitely an ambiguity here caused by inheriting
>>>>>> namespaces.
>>>>>>
>>>>>> The obvious takeaway is to use a namespace with all of your named
>>>>>> schemas. As a best practice, that avoids the problem of mixing
>>>>>> schemas with and without namespaces, and it's probably this techniq
>>>>>>
>>>>>> This same problem occurs in Java classes, where you can have a class
>>>>>> in the default package (without a package name), but it's an error to
>>>>>> import it into other packages.
>>>>>>
>>>>>> The ".MyRecord" notation might be the right way to clarify this, but
>>>>>> we can also go the Java route (i.e. you can't mix namespaced schema
>>>>>> and non-namespaced schemas). What do you think?
>>>>>>
>>>>>> Best regards, Ryan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 22, 2022 at 10:49 PM Brennan Vincent <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> On 2022/08/22 20:05:22 Martin Grigorov wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I might be wrong but I think your sample schema should be valid! Does
>>>>>> it
>>>>>>>> fail with any of the SDKs ?
>>>>>>>
>>>>>>> Yes. It fails with the Python avro package.
>>>>>>>
>>>>>>>>
>>>>>>>> This part of the spec talks about the namespace, not the type. I.e.
>>>>>>>> "namespace": ".ns" would be an error.
>>>>>>>
>>>>>>> The linked thread (
>>>>>> https://lists.apache.org/thread/q0o58fxgvstvdlgpoyv2pcz53borp587 )
>>>>>>> is a bit vague -- it's not totally clear whether the restriction is
>>>>>> meant to apply to
>>>>>>> namespaces only, or to fullnames also.
>>>>>>>
>>>>>>> "The null namespace may not be used in a dot-separated sequence of
>>>>>> names."
>>>>>>>
>>>>>>> certainly makes it sound like it applies to _any_ sequence of names,
>>>>>> though,
>>>>>>> not just in a namespace field.
>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 22, 2022 at 10:40 PM Brennan Vincent <[email protected]
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> https://github.com/apache/avro/pull/917 introduced the following
>>>>>> language
>>>>>>>>> to the spec:
>>>>>>>>>
>>>>>>>>>> The null namespace may not be used in a dot-separated sequence of
>>>>>> names.
>>>>>>>>>
>>>>>>>>> Thus ruling out fullnames like ".foo".
>>>>>>>>>
>>>>>>>>> However, this seems to rule out referring to names in the default
>>>>>>>>> namespace from another namespace.
>>>>>>>>>
>>>>>>>>> For example, this schema was previously allowed by the spec:
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>> "type": "record",
>>>>>>>>> "name": "r",
>>>>>>>>> "fields": [
>>>>>>>>> {
>>>>>>>>> "name": "f",
>>>>>>>>> "type": {
>>>>>>>>> "type": "record",
>>>>>>>>> "name": "r2",
>>>>>>>>> "namespace": "ns",
>>>>>>>>> "fields": [
>>>>>>>>> {
>>>>>>>>> "name": "f2",
>>>>>>>>> "type": ["null", ".r"]
>>>>>>>>> }
>>>>>>>>> ]
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> ]
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Note ".r" in the type of "f2". This can't be changed to "r",
>>>>>>>>> because that would be interpreted as "ns.r" due to "ns" being the
>>>>>> nearest
>>>>>>>>> enclosing namespace.
>>>>>>>>>
>>>>>>>>> Thus it seems that the new spec has restricted the set of valid
>>>>>> schemas
>>>>>>>>> and there is no longer
>>>>>>>>> any way to accomplish this.
>>>>>>>>>
>>>>>>>>> Am I misinterpreting the spec? Does the empty namespace being
>>>>>> disallowed
>>>>>>>>> in dotted sequences
>>>>>>>>> of names only apply to initial name definitions, but not to later
>>>>>> name
>>>>>>>>> references? Or is there
>>>>>>>>> some other way to express this?
>>>>>>>>>
>>>>>>>>> Here is the initial discussion of this change, where the issue I'm
>>>>>> raising
>>>>>>>>> here doesn't
>>>>>>>>> appear to have come up:
>>>>>>>>> https://lists.apache.org/thread/q0o58fxgvstvdlgpoyv2pcz53borp587
>>>>>>>>>