+1 (non-binding) First of all, thanks Rok for working on this 🙌 I raised the mentioned issue on GitHub back in December 2022 and I still believe it would be a good addition to the spec.
In Iceberg UUIDs are encoded using big endian. For example, the UUID: f79c3e09-677c-4bbd-a479-3f349cb785e7 is encoded as a byte array: F7 9C 3E 09 67 7C 4B BD A4 79 3F 34 9C B7 85 E7. Avro supported UUIDs for a long time as a logical type on top of a string, but now also using fixed[16] <https://issues.apache.org/jira/browse/AVRO-3918> which is the way to go <https://docs.google.com/document/d/16_oSWrEM7AFUCTe0uuraAEHxywezLfoEz5ahzwvhGUk/edit#heading=h.43xuauwfk7ow> and is also in line with the PR by Rok. Kind regards, Fokko Op ma 29 apr 2024 om 20:37 schreef Micah Kornfield <emkornfi...@gmail.com>: > You are correct, it looks like UUID version should be encoded properly in > the UUID data, I think another concern around endianess was raised which > should probably be resolved before the vote is finalized. > > Thanks, > Micah > > On Monday, April 29, 2024, Felipe Oliveira Carvalho <felipe...@gmail.com> > wrote: > > > Isn't that easily decodable from the UUID data itself? > > > > If you allow the version to be specified as metadata, you now have to > > validate and make sure it's consistent with the version encoded in the > > contents of the UUID column. And UUID versions are more of a concern > > for UUID generation than consumption. > > > > -- > > Felipe > > > > On Mon, Apr 29, 2024 at 2:31 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > > > > Apologies for the late reply, but I think being able to specify the > UUID > > > version as metadata might make sense in some cases? > > > > > > On Fri, Apr 19, 2024 at 1:22 PM Rok Mihevc <rok.mih...@gmail.com> > wrote: > > > > > > > Hi all, > > > > > > > > Following initial requests [1][2] and recent tangential ML discussion > > [3] I > > > > would like to propose a vote to add language for UUID canonical > > extension > > > > type to CanonicalExtensions.rst as in PR [4] and written below. > > > > A draft C++ and Python implementation PR can be seen here [5]. > > > > > > > > [1] https://lists.apache.org/thread/k2zvgoq62dyqmw3mj2t6ozfzhzkjkc4j > > > > [2] https://github.com/apache/arrow/issues/15058 > > > > [3] https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n > > > > [4] https://github.com/apache/arrow/pull/41299 <- proposed change > > > > [5] https://github.com/apache/arrow/pull/37298 > > > > > > > > > > > > The vote will be open for at least 72 hours. > > > > > > > > [ ] +1 Accept this proposal > > > > [ ] +0 > > > > [ ] -1 Do not accept this proposal because... > > > > > > > > > > > > UUID > > > > ==== > > > > > > > > * Extension name: `arrow.uuid`. > > > > > > > > * The storage type of the extension is ``FixedSizeBinary`` with a > > length of > > > > 16 bytes. > > > > > > > > .. note:: > > > > A specific UUID version is not required or guaranteed. This > > extension > > > > represents > > > > UUIDs as FixedSizeBinary(16) and does not interpret the bytes in > any > > > > way. > > > > > > > > > > > > > > > > Rok > > > > > > >