Without time based uuid's as a special type I think these aren't as useful, since the only comparator that works on a non time UUID is equality. For TimeUUIDs you need another comparator (and type) since they are not lexicographically comparable but then you can actually benefit from range predicates as well as equality. I think the biggest benefit Iceberg gives us is file pruning and if we can't be that much better with a special UUID type I think it may not be worth the complexity.
Storing a UUID as a string is a pretty wasteful config but not something I think we should make an additional type to avoid. So i'm at best +0 on UUIDs > On Jul 27, 2021, at 11:54 PM, Jack Ye <yezhao...@gmail.com> wrote: > > Yes I agree with Jacques that fixed binary is what it is in the end. I think > It is more about user experience, whether the conversion is done at the user > side or Iceberg and engine side. Many people just store UUID as a 36 byte > string instead of a 16 byte binary, so with an explicit UUID type, Iceberg > can optimize this common use case internally for users. There might be some > other benefits I overlooked, but maybe the complication introduced by this > type does not really justify the slightly better user experience. I am also > on the fence about it. > > -Jack Ye > > On Tue, Jul 27, 2021 at 7:54 PM Jacques Nadeau <jacquesnad...@gmail.com > <mailto:jacquesnad...@gmail.com>> wrote: > What specific arguments are there for it being a first class type besides it > is elsewhere? Is there some kind of optimization iceberg or an engine could > do if it was typed versus just a bucket of bits? Fixed width binary seems to > cover the cases I see in terms of actual functionality in the iceberg > libraries or engines… > > > > On Tue, Jul 27, 2021 at 6:54 PM Yan Yan <yyany...@gmail.com > <mailto:yyany...@gmail.com>> wrote: > One conversation I used to come across regarding UUID deprecation was from > https://github.com/apache/iceberg/pull/1611 > <https://github.com/apache/iceberg/pull/1611> > > Thanks, > Yan > > On Tue, Jul 27, 2021 at 1:07 PM Peter Vary <pv...@cloudera.com.invalid> wrote: > Hi Joshua, > > I do not have a strong preference about the UUID type, but I would like the > highlight, that the type is handled inconsistently in Iceberg with different > file formats. (See: https://github.com/apache/iceberg/issues/1881 > <https://github.com/apache/iceberg/issues/1881>) > > If we keep the type, it would be good to standardize the handling in every > file format. > > Thanks, Peter > > On Tue, 27 Jul 2021, 17:08 Joshua Howard, <joshthow...@gmail.com > <mailto:joshthow...@gmail.com>> wrote: > Hi. > > UUID is a current data type according to the Iceberg spec > (https://iceberg.apache.org/spec/#primitive-types > <https://iceberg.apache.org/spec/#primitive-types>), but there seems to have > been some discussion about removing it? I could not find the original > discussion, but a reference to the discussion can be found here > (https://github.com/trinodb/trino/issues/6663 > <https://github.com/trinodb/trino/issues/6663>). > > I generally agree with the consensus in the Trino issue to keep UUID in > Iceberg. To summarize… > > - It makes sense to keep the type now that row identifiers are supported > - Some engines (Trino) have support for the UUID type > - Engines w/o support for UUID type can determine how to map > > Does anyone want to remove the type? If so, why?