Thanks for the comment, Oscar! On Tue, Nov 21, 2023 at 5:06 PM Oscar Westra van Holthe - Kind < os...@westravanholthe.nl> wrote:
> Hi all, > > On the topic of allowed characters, we've settled in the (recent) past on > alphanumeric characters only for simple names. Reasoning was that this is > most portable. > > Remembering that, allowing the use of hyphens (or underscores) will not > unduly complicate things. I would, however, draw a line at characters that > are normally not found in words, like !@#€$%^&*(){}[],<>/?|"';:... > > Specifically, I think we should only add to the spec to allow words of > alphanumeric characters, optionally in snake or kebab case. In a regular > expression, this would mean: [a-zA-Z][0-9a-zA-Z_-]*[0-9a-zA-Z]? > +1 I like the new regex! Martin > > Kind regards, > Oscar > > > On Mon, 13 Nov 2023 at 21:06, Martin Grigorov <mgrigo...@apache.org> > wrote: > > > Hi Jon, > > > > Thank you for this email! > > > > On Mon, Nov 13, 2023 at 7:56 PM Jonathan Slusher <jonslus...@gmail.com> > > wrote: > > > > > Hi, > > > > > > I opened an issue in the AVRO project in Jira < > > > https://issues.apache.org/jira/browse/AVRO-3900> and I’ve been asked > to > > > submit a topic for discussion to this email group. > > > > > > See this issue in the rust schema_registry_converter repo < > > > https://github.com/gklijs/schema_registry_converter/issues/100> for > > > details specific to the crate in rust that we’re having trouble with: > > > > > > A couple of things to point out here: > > > > > > 1. I understand that at this time the Avro spec does not allow hyphens > in > > > its namespaces, but somehow our registry is allowing them to be created > > > from our Debezium connectors. We have been using the confluent_python < > > > https://github.com/confluentinc/confluent-kafka-python> module and > since > > > version 1.9.2, its deserializer seems to handle these hyphens without > > > error. We also have several JDBC sink connectors with consumer groups > > that > > > are able to use these topics. > > > > > > 2. We recently attempted to implement a consumer written in rust and > the > > > crate <https://github.com/gklijs/schema_registry_converter> above, > which > > > is used for deserialization, throws an exception when attempting to > > connect > > > to these topics. > > > > > > ``` > > > thread 'main' panicked at /app/src/utils/kafka.rs:67:35: > > > Error decoding value: Error: Supplied raw value > > > > > > "{\"type\":\"record\",\"name\":\"Envelope\",\"namespace\":\"debezium.abc-123-efg-20231005.table.u_table_dbz\",\"fields\":[{\"name\":\"before\",\"type\":[\"null\",{\"type\":\"record\",\"name\":\"Value\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"uid\",\"type\":{\"type\":\"long\",\"connect.default\":0},\"default\":0},{\"name\":\"release_id\",\"type\":{\"type\":\"long\",\"connect.default\":0},\"default\":0},{\"name\":\"notes\",\"type\":\"string\"},{\"name\":\"notes_public\",\"type\":{\"type\":\"string\",\"connect.version\":1,\"connect.parameters\":{\"allowed\":\"Y,N\"},\"connect.default\":\"N\",\" > > > connect.name > > > > > > \":\"io.debezium.data.Enum\"},\"default\":\"N\"},{\"name\":\"added_ts\",\"type\":{\"type\":\"long\",\"connect.version\":1,\" > > > connect.name\":\"io.debezium.time.Timestamp\"}}],\"connect.name > > > > > > \":\"debezium.abc-123-efg-20231005.table.u_table_dbz.Value\"}],\"default\":null},{\"name\":\"after\",\"type\":[\"null\",\"Value\"],\"default\":null},{\"name\":\"source\",\"type\":{\"type\":\"record\",\"name\":\"Source\",\"namespace\":\"io.debezium.connector.mysql\",\"fields\":[{\"name\":\"version\",\"type\":\"string\"},{\"name\":\"connector\",\"type\":\"string\"},{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"ts_ms\",\"type\":\"long\"},{\"name\":\"snapshot\",\"type\":[{\"type\":\"string\",\"connect.version\":1,\"connect.parameters\":{\"allowed\":\"true,last,false,incremental\"},\"connect.default\":\"false\",\" > > > connect.name > > > > > > \":\"io.debezium.data.Enum\"},\"null\"],\"default\":\"false\"},{\"name\":\"db\",\"type\":\"string\"},{\"name\":\"sequence\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"table\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"server_id\",\"type\":\"long\"},{\"name\":\"gtid\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"file\",\"type\":\"string\"},{\"name\":\"pos\",\"type\":\"long\"},{\"name\":\"row\",\"type\":\"int\"},{\"name\":\"thread\",\"type\":[\"null\",\"long\"],\"default\":null},{\"name\":\"query\",\"type\":[\"null\",\"string\"],\"default\":null}],\" > > > connect.name > > > > > > \":\"io.debezium.connector.mysql.Source\"}},{\"name\":\"op\",\"type\":\"string\"},{\"name\":\"ts_ms\",\"type\":[\"null\",\"long\"],\"default\":null},{\"name\":\"transaction\",\"type\":[\"null\",{\"type\":\"record\",\"name\":\"block\",\"namespace\":\"event\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"},{\"name\":\"total_order\",\"type\":\"long\"},{\"name\":\"data_collection_order\",\"type\":\"long\"}],\"connect.version\":1,\" > > > connect.name > > > \":\"event.block\"}],\"default\":null}],\"connect.version\":1,\" > > > connect.name > > \":\"debezium.abc-123-efg-20231005.table.u_table_dbz.Envelope\"}" > > > cant be turned into a Schema, was cause by Invalid namespace > > > debezium.abc-123-efg-20231005.table.u_table_dbz. It must match the > regex > > > '^([A-Za-z_][A-Za-z0-9_]*(\.[A-Za-z_][A-Za-z0-9_]*)*)?$', it's > retriable: > > > false, it's cached: false > > > ``` > > > > > > Ideally, the Avro spec would just accept hyphens since they’re a pretty > > > common character and unavoidable in certain circumstances. If this is > > > easier said than done, I think at the least any library used for Avro > > > deserialization should account for them, including the rust library. If > > > this works in Java and Python, shouldn’t it also work in rust? > > > > > > > This is exactly the reason I asked you to raise this question here in the > > mailing list! > > I also agree that if most/all of the SDKs will allow hyphens in the > > name[space] then it is a better idea to add it to the list of allowed > > characters in the specification instead of adding logic to disable the > > validation. > > > > @Avro devs: What is your opinion ? > > > > Martin > > > > > > > > > > Here’s a generic example of a schema created by a Debezium connector: > > > > > > ``` > > > { > > > "type": "record", > > > "name": "Envelope", > > > "namespace": "abc-123-efg-20231005.table.u_table_dbz", > > > "fields": [ > > > { > > > "name": "before", > > > "type": [ > > > "null", > > > { > > > ... > > > ``` > > > > > > Please let me know if you need any more details, and thank you! > > > > > > Jon Slusher > > > > > > > > > > > > > > > > > -- > > ✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl> >