Aren't we going to need efficient encodings in order to make decimal work well, anyway?
- Dan On Thu, Nov 16, 2017 at 2:54 PM, Todd Lipcon <[email protected]> wrote: > On Thu, Nov 16, 2017 at 2:28 PM, Dan Burkert <[email protected]> > wrote: > > > I think it would be useful. As far as I've seen the main costs in > > carrying data types are in writing performant encoders, and updating > > integrations to work with them. I'm guessing with 128 bit integers there > > would be some integrations that can't or won't support it, which might > be a > > cause for confusion. Overall, though, I think the upsides of efficiency > > and decreased storage space are compelling. Do you have a sense yet of > > what encodings are going to be supported down the road (will we get to > full > > parity with 32/64)? > > > > Yea, my concerns are: > > 1) Integrations: do we have a compatible SQL type to map this to in Spark > SQL, Impala, Presto, etc? What type would we map to in Java? It seems like > the most natural mapping would be DECIMAL(39) or somesuch in SQL. So, if > we're going to map it the same as decimal anyway, why not just _not_ expose > it and only expose decimal? If someone wants to store a 128-bit hash as a > DECIMAL(39) they are free to, of course. Postgres's built-in int types only > go up to 64-bit (bigint) > > In addition to the choice of DECIMAL, for things like fixed-length binary > maybe we are better off later adding a fixed-length BINARY type, like > BINARY(16) which could be used for storing large hashes? There is precedent > for fixed-length CHAR(n) in SQL, but no such precedent for int128. > > > 2) Encoders: like Dan mentioned, it seems like we might not be able to do a > very efficient job of encoding these very large integers. Stuff like > bitshuffle, SIMD bitpacking, etc, isn't really designed for such large > values. So, I'm a little afraid that we'll end up only with PLAIN and > people will be upset with the storage overhead and performance. > > -Todd > > > > > On Thu, Nov 16, 2017 at 2:19 PM, Grant Henke <[email protected]> > wrote: > > > >> Hi all, > >> > >> As a part of adding DECIMAL support to Kudu it was necessary to add > >> internal support for 128 bit integers. Taking that one step further and > >> supporting public columns and APIs for 128 bit integers would not be too > >> much additional work. However, I wanted to gauge the interest from the > >> community. > >> > >> My initial thoughts are that having an INT128 column type could be > useful > >> for things like UUIDs, IPv6 addresses, MD5 hashes and other similar > types > >> of data. > >> > >> Is there any interest or uses for a INT128 column type? Is anyone > >> currently using a STRING or BINARY column for 128 bit data? > >> > >> Thank you, > >> Grant > >> -- > >> Grant Henke > >> Software Engineer | Cloudera > >> [email protected] | twitter.com/gchenke | linkedin.com/in/granthenke > >> > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
