Hey everyone, Happy New Year! Best wishes for 2024 for you and your family.
I went ahead and created a PR for the spec change: https://github.com/apache/avro/pull/2672 Let me know if there are any questions or concerns. Kind regards, Fokko Op vr 22 dec 2023 om 14:52 schreef Fokko Driesprong <fo...@apache.org>: > Hi Martin and Scott, > > Thanks for the question, and that's a good one. I would suggest: > > { > > "type": "fixed", > > "size": 16, > > "logicalType": "uuid" > > } > > This is in line with the other logicalTypes. For example with date: > > { > "type": "int", > "logicalType": "date" > } > > If you don't support the date, you can still read the int itself (days > since Epoch). > > I've added a schema example to the Google doc and created a PR > <https://github.com/apache/avro/pull/2646/> to clarify the current > situation. > > I am curious about what you guys think of the proposed JSON-type > representation. > > Kind regards, > Fokko > > > Op vr 22 dec 2023 om 14:25 schreef Scott Belden <scottabel...@gmail.com>: > >> I think you'd have to go with something like one of the first two options >> (something in the schema) rather than some flag in a library. The problem >> with an flag in a library is if someone has an avro file they want to >> deserialize, they might not know if it was encoded with uuids as bytes or >> strings and they'd be left with guessing one and trying again with the >> second if the first failed which would not be a pleasant experience. >> >> -Scott >> >> On Fri, Dec 22, 2023 at 5:00 AM Martin Grigorov <mgrigo...@apache.org> >> wrote: >> >> > Hi, >> > >> > How would the application tell Avro what storage type to use - String or >> > bytes ? >> > - new logical type ? e.g. "logicalType": "uuid-bytes" >> > - extra attribute ? e.g. { ..., "logicalType": "uuid", "storage-type": >> > "bytes" } >> > - global switch that tells the library to always use "string" or "bytes" >> > for all UUIDs ? >> > - ... >> > >> > Martin >> > >> > On Fri, Dec 22, 2023 at 10:49 AM Fokko Driesprong <fo...@apache.org> >> > wrote: >> > >> > > Hey everyone, >> > > >> > > For Iceberg we're using UUIDs in Avro and we're storing them as >> binary, >> > > rather than a string. This has several advantages such as more compact >> > > storage, more efficient reading, and more efficient skipping. For more >> > > details, please check out the doc that I've created >> > > < >> > > >> > >> https://docs.google.com/document/d/16_oSWrEM7AFUCTe0uuraAEHxywezLfoEz5ahzwvhGUk/edit#heading=h.43xuauwfk7ow >> > > > >> > > (and feel free to comment). Also created AVRO-3918 >> > > <https://issues.apache.org/jira/browse/AVRO-3918> on Jira to track >> this. >> > > >> > > Looking forward to hearing from y'all! >> > > >> > > Kind regards and happy holidays, >> > > >> > > Fokko Driesprong >> > > >> > >> >