Current java code already accept Unicode char, i added a unit test to show
it
<https://github.com/apache/avro/blob/8969bc15174b96ca17b8b264b596e0d3de7c9436/lang/java/avro/src/test/java/org/apache/avro/TestSchema.java#L72>.
It accepts Japanese and Chinese name, that match our needs, but as it's not
official, our code convert it to no-understable name, that generates issues
(even if we keep original name in a property of the schema field).
On rust, it would imply this change
<https://github.com/apache/avro/blob/8969bc15174b96ca17b8b264b596e0d3de7c9436/lang/rust/avro/src/schema.rs#L1283-L1293>
and on C, i made this PR <https://github.com/apache/avro/pull/1798>

For Java implementation, change documentation wouldn't be a breaking
change, but adapt code to strictly conform to documentation would be.

This is why i proposed this JIRA
<https://issues.apache.org/jira/browse/AVRO-3532> to enlarge accepted names
and give Avro more possibilities.

Le jeu. 26 mai 2022 à 23:32, Nilesh Yadav <[email protected]> a
écrit :

> GCP storage\analytics support unicode characters in column name but Avro
> which is used for message transfer does not. I'm trying to bridge the gap.
> Which is the reason I'm looking for unicode support in Avro schema.
>
>
> On Mon, May 16, 2022 at 12:07 PM Ryan Skraba <[email protected]> wrote:
>
> > Hello,
> >
> > At this point, changing the naming rules of the specification would be
> > a pretty significant breaking change, and I don't think it's likely to
> > happen without a compelling champion that feels strongly about the
> > issue!  As far as I know, this is
> >
> > Using non-compliant names *might* work with some implementations, but
> > there's no guarantee that that schema would be interoperable between
> > Avro versions and languages.  This is true regardless of how the
> > schema was generated, unfortunately, including from Avro IDL... I
> > don't believe that the AVDL example above would interoperate with the
> > Python SDK.
> >
> > In my experience, the usual reason I've encountered for wanting to
> > have unicode identifiers is have better, language-specific names or
> > fields like "prénom".  As an alternative, in practice, this can be
> > accomplished by adding your own custom JSON property to the type or
> > field name, like "label" or "display.name" (or by reusing the "doc"
> > field) for the human-readable internationalized name.  This technique
> > has the advantage, as well, of potentially supporting multiple
> > languages with different properties, or allowing you to rename the
> > field without affecting the canonical schema.  The disadvantage is
> > that none of this is provided for you...
> >
> > Is there a specific use case that you're looking to support?
> >
> > Ryan
> >
> > On Tue, May 10, 2022 at 6:55 AM Oscar Westra van Holthe - Kind
> > <[email protected]> wrote:
> > >
> > > On  Mon 9 May 2022 23:27, Zoltan Csizmadia <[email protected]> wrote:
> > >
> > > > Here are some protocol definition examples used for testing. They are
> > not
> > > > schemas, however it should work the same:
> > > >
> > > >
> > > >
> >
> https://github.com/apache/avro/blob/master/lang/java/compiler/src/test/idl/input/unicode.avdl
> > > >
> > > >
> >
> https://github.com/apache/avro/blob/master/lang/java/compiler/src/test/idl/output/unicode.avpr
> > >
> > >
> > > As an aside, there are 2 PRs (#1588 [1] & #1589 [2]) for an ANTLR based
> > > grammar that can also support a schema file equivalent syntax.
> > >
> > > It won't help you now, but it may be something to keep an eye on.
> > >
> > > Kind regards,
> > > Oscar
> > >
> > >
> > > [1] https://github.com/apache/avro/pull/1588
> > > [2] https://github.com/apache/avro/pull/1589
> > >
> > > --
> > > Oscar Westra van Holthe - Kind <[email protected]>
> >
>

Reply via email to