Actually, that isn't all that far-fetched of a format Matt - pretty common anytime anyone wants to do sortable lat/long (*cough* three letter agencies cough*).
Wouldn't we get the same by providing a simple set of libraries (ala orderly + other HBase useful things) and then still give access to the underlying byte array? Perhaps a nullable key type in that lib makes sense if lots of people need it and it would be nice to have standard libraries so tools could interop much more easily. ------------------- Jesse Yates @jesse_yates jyates.github.com On Mon, Apr 1, 2013 at 11:17 PM, Matt Corgan <mcor...@hotpads.com> wrote: > Ah, I didn't even realize sql allowed null key parts. Maybe a goal of the > interfaces should be to provide first-class support for custom user types > in addition to the standard ones included. Part of the power of hbase's > plain byte[] keys is that users can concoct the perfect key for their data > type. For example, I have a lot of geographic data where I interleave > latitude/longitude bits into a sortable 64 bit value that would probably > never be included in a standard library. > > > On Mon, Apr 1, 2013 at 8:38 PM, Enis Söztutar <enis....@gmail.com> wrote: > > > I think having Int32, and NullableInt32 would support minimum overhead, > as > > well as allowing SQL semantics. > > > > > > On Mon, Apr 1, 2013 at 7:26 PM, Nick Dimiduk <ndimi...@gmail.com> wrote: > > > > > Furthermore, is is more important to support null values than squeeze > all > > > representations into minimum size (4-bytes for int32, &c.)? > > > On Apr 1, 2013 4:41 PM, "Nick Dimiduk" <ndimi...@gmail.com> wrote: > > > > > > > On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <jtay...@salesforce.com > > > >wrote: > > > > > > > >> From the SQL perspective, handling null is important. > > > > > > > > > > > > From your perspective, it is critical to support NULLs, even at the > > > > expense of fixed-width encodings at all or supporting representation > > of a > > > > full range of values. That is, you'd rather be able to represent NULL > > > than > > > > -2^31? > > > > > > > > On 04/01/2013 01:32 PM, Nick Dimiduk wrote: > > > >> > > > >>> Thanks for the thoughtful response (and code!). > > > >>> > > > >>> I'm thinking I will press forward with a base implementation that > > does > > > >>> not > > > >>> support nulls. The idea is to provide an extensible set of > > interfaces, > > > >>> so I > > > >>> think this will not box us into a corner later. That is, a > mirroring > > > >>> package could be implemented that supports null values and accepts > > > >>> the relevant trade-offs. > > > >>> > > > >>> Thanks, > > > >>> Nick > > > >>> > > > >>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <mcor...@hotpads.com> > > > >>> wrote: > > > >>> > > > >>> I spent some time this weekend extracting bits of our > serialization > > > >>>> code to > > > >>>> a public github repo at http://github.com/hotpads/**data-tools< > > > http://github.com/hotpads/data-tools> > > > >>>> . > > > >>>> Contributions are welcome - i'm sure we all have this stuff > laying > > > >>>> around. > > > >>>> > > > >>>> You can see I've bumped into the NULL problem in a few places: > > > >>>> * > > > >>>> > > > >>>> https://github.com/hotpads/**data-tools/blob/master/src/** > > > >>>> main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java< > > > > > > https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java > > > > > > > >>>> * > > > >>>> > > > >>>> https://github.com/hotpads/**data-tools/blob/master/src/** > > > >>>> main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java< > > > > > > https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java > > > > > > > >>>> > > > >>>> Looking back, I think my latest opinion on the topic is to reject > > > >>>> nullability as the rule since it can cause unexpected behavior and > > > >>>> confusion. It's cleaner to provide a wrapper class (so both > > > >>>> LongArrayList > > > >>>> plus NullableLongArrayList) that explicitly defines the behavior, > > and > > > >>>> costs > > > >>>> a little more in performance. If the user can't find a pre-made > > > wrapper > > > >>>> class, it's not very difficult for each user to provide their own > > > >>>> interpretation of null and check for it themselves. > > > >>>> > > > >>>> If you reject nullability, the question becomes what to do in > > > situations > > > >>>> where you're implementing existing interfaces that accept nullable > > > >>>> params. > > > >>>> The LongArrayList above implements List<Long> which requires an > > > >>>> add(Long) > > > >>>> method. In the above implementation I chose to swap nulls with > > > >>>> Long.MIN_VALUE, however I'm now thinking it best to force the user > > to > > > >>>> make > > > >>>> that swap and then throw IllegalArgumentException if they pass > null. > > > >>>> > > > >>>> > > > >>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil < > > > >>>> doug.m...@explorysmedical.com > > > >>>> > > > >>>>> wrote: > > > >>>>> HmmmŠ good question. > > > >>>>> > > > >>>>> I think that fixed width support is important for a great many > > rowkey > > > >>>>> constructs cases, so I'd rather see something like losing > MIN_VALUE > > > and > > > >>>>> keeping fixed width. > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <ndimi...@gmail.com> wrote: > > > >>>>> > > > >>>>> Heya, > > > >>>>>> > > > >>>>>> Thinking about data types and serialization. I think null > support > > is > > > >>>>>> an > > > >>>>>> important characteristic for the serialized representations, > > > >>>>>> especially > > > >>>>>> when considering the compound type. However, doing so in > directly > > > >>>>>> incompatible with fixed-width representations for numerics. For > > > >>>>>> > > > >>>>> instance, > > > >>>> > > > >>>>> if we want to have a fixed-width signed long stored on 8-bytes, > > where > > > >>>>>> do > > > >>>>>> you put null? float and double types can cheat a little by > folding > > > >>>>>> negative > > > >>>>>> and positive NaN's into a single representation (this isn't > > strictly > > > >>>>>> correct!), leaving a place to represent null. In the long > example > > > >>>>>> case, > > > >>>>>> the > > > >>>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by > > one. > > > >>>>>> This > > > >>>>>> will allocate an additional encoding which can be used for null. > > My > > > >>>>>> experience working with scientific data, however, makes me wince > > at > > > >>>>>> the > > > >>>>>> idea. > > > >>>>>> > > > >>>>>> The variable-width encodings have it a little easier. There's > > > already > > > >>>>>> enough going on that it's simpler to make room. > > > >>>>>> > > > >>>>>> Remember, the final goal is to support order-preserving > > > serialization. > > > >>>>>> This > > > >>>>>> imposes some limitations on our encoding strategies. For > instance, > > > >>>>>> it's > > > >>>>>> not > > > >>>>>> enough to simply encode null, it really needs to be encoded as > > 0x00 > > > so > > > >>>>>> > > > >>>>> as > > > >>>> > > > >>>>> to sort lexicographically earlier than any other value. > > > >>>>>> > > > >>>>>> What do you think? Any ideas, experiences, etc? > > > >>>>>> > > > >>>>>> Thanks, > > > >>>>>> Nick > > > >>>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >> > > > > > > > > > >