On 2022-08-31 17:18, Martin Grigorov wrote:
> On Wed, Aug 31, 2022 at 9:59 PM Brennan Vincent <[email protected]>
> wrote:
>
>>
>>
>> On 2022-08-31 13:38, Ryan Skraba wrote:
>>> Hello!  I've been trying out some POC code with Java to see what would
>>> be the impact on that SDK -- in the past, a lot of the development has
>>> been pretty Java-centric, but this is definitely not a requirement!
>>>
>>> Currently, the worst scenario I found is something like:
>>>
>>> { "type" : "record",
>>>   "name" : "A",
>>>   "fields" : [ { "name" : "a1",
>>>     "type" : {
>>>       "type" : "record",
>>>       "name" : "B",
>>>       "fields" : [ { "name" : "b1",  "type" : [ "null", "A" ],
>>> "default" : null  } ] } } ] }
>>>
>>> This is a recursive definition that would like like a linked list
>>> alternating A records containing B records containing A records, etc.
>>>
>>> If you were to only change the name of B to test.B (A fully qualified
>>> namespace), Java can still parse the schema but the generated code
>>> unsurprisingly no longer compiles.  It correctly finds the outer
>>> schema (and doesn't try to look for test.A) but it's impossible to
>>> import into the generated Java code.
>>>
>>> If you were to only change the name of A to test A, this is fine.
>>>
>>> I was playing around a bit with "auto-mangling" the packages to put A
>>> in root$.A for this case, but I think it's a hopeless case for Java --
>>> there's too many ways for the default package to "sneak" into the
>>> system from other previously compiled classes, or from IDL, etc.
>>>
>>> I think it's still possible to try and accept the .Foo syntax but we'd
>>> have to note that (for Java) mixing namespaced schemas and
>>> null-namespaced schemas is either not supported, or we supply a
>>> mechanism in Java to put ALL unnamespaced generated classes in a
>>> folder like root$.
>>>
>>> Thanks for pointing out part 4, I'm also taking a look at the impact
>>> there!  Given that these mixed namespace schemas are likely to already
>>> be broken, I don't know if it's too big of an impact!  Especially if
>>> we say that the dot is only added when strictly necessary to prevent
>>> namespace inheritance.
>>
>> There is still a question for non-mixed schemas.
>>
>> Consider the following schema:
>>
>> {
>>     "type": "fixed",
>>     "name": "Foo",
>>     "size": 10
>> }
>>
>> Now, if we clarify the spec to say that leading dots are valid in
>> default-namespace fullnames, then when this is normalized, the
>> current language of the description of PCF implies that its
>>
>
> Please copy/paste the text from the spec that implies that the name should
> be ".Foo".
> Otherwise we will have to guess which sentence you mean exactly.

[FULLNAMES] Replace short names with fullnames, using applicable namespaces
to do so. Then eliminate namespace attributes, which are now redundant.

>
> I don't see any pluses or minuses in using the leading dot in the PCF for
> top-level names. IMO there is no difference with both representations.
> For inner names the leading dot should be preserved in the PCF. Otherwise
> it will start using the enclosing namespace after parsing.
>
>
>> name should be rewritten to ".Foo". However, this is contrary to current
>> behavior.
>>
>> So, if it's okay to change the behavior on existing valid schemas, then
>> we should do so. If it's not okay, then we should clarify the spec to
>> say that names are normalized to fullnames for PCF, _except_
>> in the special case of the non-default namespace.
>>
>>>
>>> I'll keep digging on the Java side.  Anybody else from the other SDKs
>>> want to weigh in?  What would happen with C# generated code?
>>>
>>> All my best, Ryan
>>>
>>>
>>>
>>> On Fri, Aug 26, 2022 at 4:10 PM Brennan Vincent <[email protected]>
>> wrote:
>>>>
>>>> I’m in favor of allowing .Foo as a fullname for the following reasons:
>>>>
>>>> 1. I believe the *intent* of the initial change to the spec was to only
>> refer to namespaces;
>>>> 2. Even if it is not possible in Java to generate code that refers to a
>> non-namespaced context from a namespaced one, it may be possible in other
>> languages;
>>>> 3. We do not lose anything by supporting it.
>>>> 4. Other parts of the spec assume that all names can be converted to a
>> fullname, specifically the parsing canonical form algorithm.
>>>>
>>>> Point 4. brings me to another issue. Currently, non-namespaced names
>> are left as bare names in PCF, at least by the Python SDK - they are not
>> converted to fullnames like .Foo (which makes sense, since that is out of
>> spec). However, it contradicts the spec:
>>>>
>>>> [FULLNAMES] Replace short names with fullnames, using applicable
>> namespaces to do so.
>>>>
>>>> The spec doesn’t say “only if the non-empty namespace is used”. It says
>> to always do this. So if we enable the ability to write fullnames like
>> .Foo, we need to decide whether to change the PCF behavior (this will
>> change the fingerprints of existing schemas) to match the spec, or change
>> the spec to match the current behavior.
>>>>
>>>>> On Aug 26, 2022, at 03:57, Ryan Skraba <[email protected]> wrote:
>>>>>
>>>>> Hello!  We can just discuss the impact here in the mailing list and
>>>>> make a decision by consensus.  Sometimes for major changes, we do a
>>>>> more formal VOTE thread -- this might be one of those cases.
>>>>>
>>>>> What would happen if we were to say that ".MyRecord" was valid in the
>>>>> next major version of Avro?
>>>>>
>>>>> Some SDKs used to accept this in the past and were made more strict,
>>>>> causing working examples to break?  That is really unfortunate.
>>>>>
>>>>> On the other hand, if we generate Java code today and map packages 1:1
>>>>> to namespaces... we still won't be able to mix namespaced (in a
>>>>> package) and unnamespaced (unpackaged) generated code.  Would we just
>>>>> mangle the default namespace to "default$" or ... ?  A configuration
>>>>> option for the SpecificCompiler in Java?
>>>>>
>>>>> Either way, it would be great if we didn't leave this point vague in
>>>>> the spec!   There's always the possibility to allow language SDKs to
>>>>> deviate from the spec -- if e.g. python or Java has a
>>>>> "setValidateUnqualifiedNamespace(boolean)" method, we can leave it up
>>>>> to the user whether or not to follow the strict spec.  We already do
>>>>> this with validating defaults in Java, for example.
>>>>>
>>>>> It might take a bit of thought, but if we can find some elegant way to
>>>>> make this work I don't see why we wouldn't make specification changes!
>>>>>
>>>>> Ryan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Thu, Aug 25, 2022 at 7:31 PM Brennan Vincent <
>> [email protected]> wrote:
>>>>>>
>>>>>> That is a fair point also.
>>>>>>
>>>>>> Anyway, since I'm not an Apache project member, I'm not quite sure
>> what
>>>>>> is the best way to move forward here. Is there a formal process for
>> proposing
>>>>>> changes to the spec and reaching a consensus?
>>>>>>
>>>>>> Thanks
>>>>>> Brennan
>>>>>>
>>>>>>> On 2022-08-25 01:36, Oscar Westra van Holthe - Kind wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Allowing references to the null namespace from within another
>> namespace
>>>>>>> gives schema authors more options.
>>>>>>>
>>>>>>> But if you're using namespaces at all, there must be a reason for
>> it. As a
>>>>>>> schema author, you've made the decision to group your schemata.
>>>>>>>
>>>>>>> To make this decision from schema authors more visible, I'd opt to
>> choose
>>>>>>> the Java route and in that case force all schemata to belong to a
>> group.
>>>>>>> I.e., explicitly disallow identifiers to start with a dot (and
>> disallow
>>>>>>> references to the null namespace from within another namespace).
>>>>>>>
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Oscar
>>>>>>>
>>>>>>> --
>>>>>>> Oscar Westra van Holthe - Kind <[email protected]>
>>>>>>>
>>>>>>> Op wo 24 aug. 2022 14:42 schreef Ryan Skraba <[email protected]>:
>>>>>>>
>>>>>>>> Hello!  There is definitely an ambiguity here caused by inheriting
>>>>>>>> namespaces.
>>>>>>>>
>>>>>>>> The obvious takeaway is to use a namespace with all of your named
>>>>>>>> schemas.  As a best practice, that avoids the problem of mixing
>>>>>>>> schemas with and without namespaces, and it's probably this techniq
>>>>>>>>
>>>>>>>> This same problem occurs in Java classes, where you can have a class
>>>>>>>> in the default package (without a package name), but it's an error
>> to
>>>>>>>> import it into other packages.
>>>>>>>>
>>>>>>>> The ".MyRecord" notation might be the right way to clarify this, but
>>>>>>>> we can also go the Java route (i.e. you can't mix namespaced schema
>>>>>>>> and non-namespaced schemas).  What do you think?
>>>>>>>>
>>>>>>>> Best regards, Ryan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 22, 2022 at 10:49 PM Brennan Vincent <
>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On 2022/08/22 20:05:22 Martin Grigorov wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I might be wrong but I think your sample schema should be valid!
>> Does
>>>>>>>> it
>>>>>>>>>> fail with any of the SDKs ?
>>>>>>>>>
>>>>>>>>> Yes. It fails with the Python avro package.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This part of the spec talks about the namespace, not the type.
>> I.e.
>>>>>>>>>> "namespace": ".ns" would be an error.
>>>>>>>>>
>>>>>>>>> The linked thread (
>>>>>>>> https://lists.apache.org/thread/q0o58fxgvstvdlgpoyv2pcz53borp587 )
>>>>>>>>> is a bit vague -- it's not totally clear whether the restriction is
>>>>>>>> meant to apply to
>>>>>>>>> namespaces only, or to fullnames also.
>>>>>>>>>
>>>>>>>>> "The null namespace may not be used in a dot-separated sequence of
>>>>>>>> names."
>>>>>>>>>
>>>>>>>>> certainly makes it sound like it applies to _any_ sequence of
>> names,
>>>>>>>> though,
>>>>>>>>> not just in a namespace field.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 22, 2022 at 10:40 PM Brennan Vincent <
>> [email protected]
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/apache/avro/pull/917 introduced the following
>>>>>>>> language
>>>>>>>>>>> to the spec:
>>>>>>>>>>>
>>>>>>>>>>>> The null namespace may not be used in a dot-separated sequence
>> of
>>>>>>>> names.
>>>>>>>>>>>
>>>>>>>>>>> Thus ruling out fullnames like ".foo".
>>>>>>>>>>>
>>>>>>>>>>> However, this seems to rule out referring to names in the default
>>>>>>>>>>> namespace from another namespace.
>>>>>>>>>>>
>>>>>>>>>>> For example, this schema was previously allowed by the spec:
>>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>>    "type": "record",
>>>>>>>>>>>    "name": "r",
>>>>>>>>>>>    "fields": [
>>>>>>>>>>>        {
>>>>>>>>>>>            "name": "f",
>>>>>>>>>>>            "type": {
>>>>>>>>>>>                "type": "record",
>>>>>>>>>>>                "name": "r2",
>>>>>>>>>>>                "namespace": "ns",
>>>>>>>>>>>                "fields": [
>>>>>>>>>>>                    {
>>>>>>>>>>>                        "name": "f2",
>>>>>>>>>>>                        "type": ["null", ".r"]
>>>>>>>>>>>                    }
>>>>>>>>>>>                ]
>>>>>>>>>>>            }
>>>>>>>>>>>        }
>>>>>>>>>>>    ]
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Note ".r" in the type of "f2". This can't be changed to "r",
>>>>>>>>>>> because that would be interpreted as "ns.r" due to "ns" being the
>>>>>>>> nearest
>>>>>>>>>>> enclosing namespace.
>>>>>>>>>>>
>>>>>>>>>>> Thus it seems that the new spec has restricted the set of valid
>>>>>>>> schemas
>>>>>>>>>>> and there is no longer
>>>>>>>>>>> any way to accomplish this.
>>>>>>>>>>>
>>>>>>>>>>> Am I misinterpreting the spec? Does the empty namespace being
>>>>>>>> disallowed
>>>>>>>>>>> in dotted sequences
>>>>>>>>>>> of names only apply to initial name definitions, but not to later
>>>>>>>> name
>>>>>>>>>>> references? Or is there
>>>>>>>>>>> some other way to express this?
>>>>>>>>>>>
>>>>>>>>>>> Here is the initial discussion of this change, where the issue
>> I'm
>>>>>>>> raising
>>>>>>>>>>> here doesn't
>>>>>>>>>>> appear to have come up:
>>>>>>>>>>> https://lists.apache.org/thread/q0o58fxgvstvdlgpoyv2pcz53borp587
>>>>>>>>>>>

Reply via email to