+1 (non-binding)

First of all, thanks Rok for working on this 🙌 I raised the mentioned
issue on GitHub back in December 2022 and I still believe it would be a
good addition to the spec.

In Iceberg UUIDs are encoded using big endian. For example, the UUID:
f79c3e09-677c-4bbd-a479-3f349cb785e7 is encoded as a byte array: F7 9C 3E
09 67 7C 4B BD A4 79 3F 34 9C B7 85 E7. Avro supported UUIDs for a long
time as a logical type on top of a string, but now also using fixed[16]
<https://issues.apache.org/jira/browse/AVRO-3918> which is the way to go
<https://docs.google.com/document/d/16_oSWrEM7AFUCTe0uuraAEHxywezLfoEz5ahzwvhGUk/edit#heading=h.43xuauwfk7ow>
and
is also in line with the PR by Rok.

Kind regards,
Fokko



Op ma 29 apr 2024 om 20:37 schreef Micah Kornfield <emkornfi...@gmail.com>:

> You are correct, it looks like UUID version should be encoded properly in
> the UUID data, I think another concern around endianess was raised which
> should probably be resolved before the vote is finalized.
>
> Thanks,
> Micah
>
> On Monday, April 29, 2024, Felipe Oliveira Carvalho <felipe...@gmail.com>
> wrote:
>
> > Isn't that easily decodable from the UUID data itself?
> >
> > If you allow the version to be specified as metadata, you now have to
> > validate and make sure it's consistent with the version encoded in the
> > contents of the UUID column. And UUID versions are more of a concern
> > for UUID generation than consumption.
> >
> > --
> > Felipe
> >
> > On Mon, Apr 29, 2024 at 2:31 PM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> > >
> > > Apologies for the late reply, but I think being able to specify the
> UUID
> > > version as metadata might make sense in some cases?
> > >
> > > On Fri, Apr 19, 2024 at 1:22 PM Rok Mihevc <rok.mih...@gmail.com>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Following initial requests [1][2] and recent tangential ML discussion
> > [3] I
> > > > would like to propose a vote to add language for UUID canonical
> > extension
> > > > type to CanonicalExtensions.rst as in PR [4] and written below.
> > > > A draft C++ and Python implementation PR can be seen here [5].
> > > >
> > > > [1] https://lists.apache.org/thread/k2zvgoq62dyqmw3mj2t6ozfzhzkjkc4j
> > > > [2] https://github.com/apache/arrow/issues/15058
> > > > [3] https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n
> > > > [4] https://github.com/apache/arrow/pull/41299 <- proposed change
> > > > [5] https://github.com/apache/arrow/pull/37298
> > > >
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > [ ] +1 Accept this proposal
> > > > [ ] +0
> > > > [ ] -1 Do not accept this proposal because...
> > > >
> > > >
> > > > UUID
> > > > ====
> > > >
> > > > * Extension name: `arrow.uuid`.
> > > >
> > > > * The storage type of the extension is ``FixedSizeBinary`` with a
> > length of
> > > > 16 bytes.
> > > >
> > > > .. note::
> > > >    A specific UUID version is not required or guaranteed. This
> > extension
> > > > represents
> > > >    UUIDs as FixedSizeBinary(16) and does not interpret the bytes in
> any
> > > > way.
> > > >
> > > >
> > > >
> > > > Rok
> > > >
> >
>

Reply via email to