Re: HBase Types: Explicit Null Support

Ted Yu Mon, 01 Apr 2013 17:11:08 -0700

bq. I create a dummy qualifier with a dummy value

For any single application, the above can be done.
For generic applications, how would we do this ?


Thanks


On Mon, Apr 1, 2013 at 5:07 PM, Matt Corgan <[email protected]> wrote:

> I generally don't allow nulls in my composite row keys.  Does SQL allow
> nulls in the PK?  In the rare case I wanted to do that I might create a
> separate format called NullableCInt32 with 5 bytes where the first one
> determined null.  It's important to keep the pure types pure.
>
> I have lots of null *values* however, but they're represented by lack of a
> qualifier in the Put.  If a row has all null values, I create a dummy
> qualifier with a dummy value to make sure the row key gets inserted as it
> would in sql.
>
>
> On Mon, Apr 1, 2013 at 4:49 PM, James Taylor <[email protected]>
> wrote:
>
> > On 04/01/2013 04:41 PM, Nick Dimiduk wrote:
> >
> >> On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <[email protected]>
> >> wrote:
> >>
> >>   From the SQL perspective, handling null is important.
> >>>
> >>
> >>  From your perspective, it is critical to support NULLs, even at the
> >> expense
> >> of fixed-width encodings at all or supporting representation of a full
> >> range of values. That is, you'd rather be able to represent NULL than
> >> -2^31?
> >>
> > We've been able to get away with supporting NULL through the absence of
> > the value rather than restricting the data range. We haven't had any push
> > back on not allowing a fixed width nullable leading row key column. Since
> > our variable length DECIMAL supports null and is a superset of the fixed
> > width numeric types, users have a reasonable alternative.
> >
> > I'd rather not restrict the range of values, since it doesn't seem like
> > this would be necessary.
> >
> >
> >> On 04/01/2013 01:32 PM, Nick Dimiduk wrote:
> >>
> >>> Thanks for the thoughtful response (and code!).
> >>>>
> >>>> I'm thinking I will press forward with a base implementation that does
> >>>> not
> >>>> support nulls. The idea is to provide an extensible set of interfaces,
> >>>> so
> >>>> I
> >>>> think this will not box us into a corner later. That is, a mirroring
> >>>> package could be implemented that supports null values and accepts
> >>>> the relevant trade-offs.
> >>>>
> >>>> Thanks,
> >>>> Nick
> >>>>
> >>>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[email protected]>
> >>>> wrote:
> >>>>
> >>>>   I spent some time this weekend extracting bits of our serialization
> >>>> code
> >>>>
> >>>>> to
> >>>>> a public github repo at http://github.com/hotpads/****data-tools<
> http://github.com/hotpads/**data-tools>
> >>>>> <http://github.com/**hotpads/data-tools<
> http://github.com/hotpads/data-tools>
> >>>>> >
> >>>>> .
> >>>>>    Contributions are welcome - i'm sure we all have this stuff laying
> >>>>> around.
> >>>>>
> >>>>> You can see I've bumped into the NULL problem in a few places:
> >>>>> *
> >>>>>
> >>>>> https://github.com/hotpads/****data-tools/blob/master/src/**<
> https://github.com/hotpads/**data-tools/blob/master/src/**>
> >>>>>
> main/java/com/hotpads/data/****primitive/lists/LongArrayList.****java<
> >>>>> https://github.com/**hotpads/data-tools/blob/**
> >>>>> master/src/main/java/com/**hotpads/data/primitive/lists/**
> >>>>> LongArrayList.java<
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
> >
> >>>>> >
> >>>>> *
> >>>>>
> >>>>> https://github.com/hotpads/****data-tools/blob/master/src/**<
> https://github.com/hotpads/**data-tools/blob/master/src/**>
> >>>>> main/java/com/hotpads/data/****types/floats/DoubleByteTool.****java<
> >>>>> https://github.com/**hotpads/data-tools/blob/**
> >>>>> master/src/main/java/com/**hotpads/data/types/floats/**
> >>>>> DoubleByteTool.java<
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java
> >
> >>>>> >
> >>>>>
> >>>>> Looking back, I think my latest opinion on the topic is to reject
> >>>>> nullability as the rule since it can cause unexpected behavior and
> >>>>> confusion.  It's cleaner to provide a wrapper class (so both
> >>>>> LongArrayList
> >>>>> plus NullableLongArrayList) that explicitly defines the behavior, and
> >>>>> costs
> >>>>> a little more in performance.  If the user can't find a pre-made
> >>>>> wrapper
> >>>>> class, it's not very difficult for each user to provide their own
> >>>>> interpretation of null and check for it themselves.
> >>>>>
> >>>>> If you reject nullability, the question becomes what to do in
> >>>>> situations
> >>>>> where you're implementing existing interfaces that accept nullable
> >>>>> params.
> >>>>>    The LongArrayList above implements List<Long> which requires an
> >>>>> add(Long)
> >>>>> method.  In the above implementation I chose to swap nulls with
> >>>>> Long.MIN_VALUE, however I'm now thinking it best to force the user to
> >>>>> make
> >>>>> that swap and then throw IllegalArgumentException if they pass null.
> >>>>>
> >>>>>
> >>>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <
> >>>>> [email protected]
> >>>>>
> >>>>>  wrote:
> >>>>>> HmmmŠ good question.
> >>>>>>
> >>>>>> I think that fixed width support is important for a great many
> rowkey
> >>>>>> constructs cases, so I'd rather see something like losing MIN_VALUE
> >>>>>> and
> >>>>>> keeping fixed width.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <[email protected]> wrote:
> >>>>>>
> >>>>>>   Heya,
> >>>>>>
> >>>>>>> Thinking about data types and serialization. I think null support
> is
> >>>>>>> an
> >>>>>>> important characteristic for the serialized representations,
> >>>>>>> especially
> >>>>>>> when considering the compound type. However, doing so in directly
> >>>>>>> incompatible with fixed-width representations for numerics. For
> >>>>>>>
> >>>>>>>  instance,
> >>>>>> if we want to have a fixed-width signed long stored on 8-bytes,
> where
> >>>>>> do
> >>>>>>
> >>>>>>> you put null? float and double types can cheat a little by folding
> >>>>>>> negative
> >>>>>>> and positive NaN's into a single representation (this isn't
> strictly
> >>>>>>> correct!), leaving a place to represent null. In the long example
> >>>>>>> case,
> >>>>>>> the
> >>>>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one.
> >>>>>>> This
> >>>>>>> will allocate an additional encoding which can be used for null. My
> >>>>>>> experience working with scientific data, however, makes me wince at
> >>>>>>> the
> >>>>>>> idea.
> >>>>>>>
> >>>>>>> The variable-width encodings have it a little easier. There's
> already
> >>>>>>> enough going on that it's simpler to make room.
> >>>>>>>
> >>>>>>> Remember, the final goal is to support order-preserving
> >>>>>>> serialization.
> >>>>>>> This
> >>>>>>> imposes some limitations on our encoding strategies. For instance,
> >>>>>>> it's
> >>>>>>> not
> >>>>>>> enough to simply encode null, it really needs to be encoded as 0x00
> >>>>>>> so
> >>>>>>>
> >>>>>>>  as
> >>>>>> to sort lexicographically earlier than any other value.
> >>>>>>
> >>>>>>> What do you think? Any ideas, experiences, etc?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Nick
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >
>

Re: HBase Types: Explicit Null Support

Reply via email to