Current java code already accept Unicode char, i added a unit test to show it <https://github.com/apache/avro/blob/8969bc15174b96ca17b8b264b596e0d3de7c9436/lang/java/avro/src/test/java/org/apache/avro/TestSchema.java#L72>. It accepts Japanese and Chinese name, that match our needs, but as it's not official, our code convert it to no-understable name, that generates issues (even if we keep original name in a property of the schema field). On rust, it would imply this change <https://github.com/apache/avro/blob/8969bc15174b96ca17b8b264b596e0d3de7c9436/lang/rust/avro/src/schema.rs#L1283-L1293> and on C, i made this PR <https://github.com/apache/avro/pull/1798>
For Java implementation, change documentation wouldn't be a breaking change, but adapt code to strictly conform to documentation would be. This is why i proposed this JIRA <https://issues.apache.org/jira/browse/AVRO-3532> to enlarge accepted names and give Avro more possibilities. Le jeu. 26 mai 2022 à 23:32, Nilesh Yadav <[email protected]> a écrit : > GCP storage\analytics support unicode characters in column name but Avro > which is used for message transfer does not. I'm trying to bridge the gap. > Which is the reason I'm looking for unicode support in Avro schema. > > > On Mon, May 16, 2022 at 12:07 PM Ryan Skraba <[email protected]> wrote: > > > Hello, > > > > At this point, changing the naming rules of the specification would be > > a pretty significant breaking change, and I don't think it's likely to > > happen without a compelling champion that feels strongly about the > > issue! As far as I know, this is > > > > Using non-compliant names *might* work with some implementations, but > > there's no guarantee that that schema would be interoperable between > > Avro versions and languages. This is true regardless of how the > > schema was generated, unfortunately, including from Avro IDL... I > > don't believe that the AVDL example above would interoperate with the > > Python SDK. > > > > In my experience, the usual reason I've encountered for wanting to > > have unicode identifiers is have better, language-specific names or > > fields like "prénom". As an alternative, in practice, this can be > > accomplished by adding your own custom JSON property to the type or > > field name, like "label" or "display.name" (or by reusing the "doc" > > field) for the human-readable internationalized name. This technique > > has the advantage, as well, of potentially supporting multiple > > languages with different properties, or allowing you to rename the > > field without affecting the canonical schema. The disadvantage is > > that none of this is provided for you... > > > > Is there a specific use case that you're looking to support? > > > > Ryan > > > > On Tue, May 10, 2022 at 6:55 AM Oscar Westra van Holthe - Kind > > <[email protected]> wrote: > > > > > > On Mon 9 May 2022 23:27, Zoltan Csizmadia <[email protected]> wrote: > > > > > > > Here are some protocol definition examples used for testing. They are > > not > > > > schemas, however it should work the same: > > > > > > > > > > > > > > > https://github.com/apache/avro/blob/master/lang/java/compiler/src/test/idl/input/unicode.avdl > > > > > > > > > > > https://github.com/apache/avro/blob/master/lang/java/compiler/src/test/idl/output/unicode.avpr > > > > > > > > > As an aside, there are 2 PRs (#1588 [1] & #1589 [2]) for an ANTLR based > > > grammar that can also support a schema file equivalent syntax. > > > > > > It won't help you now, but it may be something to keep an eye on. > > > > > > Kind regards, > > > Oscar > > > > > > > > > [1] https://github.com/apache/avro/pull/1588 > > > [2] https://github.com/apache/avro/pull/1589 > > > > > > -- > > > Oscar Westra van Holthe - Kind <[email protected]> > > >
