Since SQL allows null valued composite key parts, we needed to support it.
On 04/01/2013 05:10 PM, Ted Yu wrote:
bq. I create a dummy qualifier with a dummy value
For any single application, the above can be done.
For generic applications, how would we do this ?
Thanks
On Mon, Apr 1, 2013 at 5:07 PM, Matt Corgan <[email protected]> wrote:
I generally don't allow nulls in my composite row keys. Does SQL allow
nulls in the PK? In the rare case I wanted to do that I might create a
separate format called NullableCInt32 with 5 bytes where the first one
determined null. It's important to keep the pure types pure.
I have lots of null *values* however, but they're represented by lack of a
qualifier in the Put. If a row has all null values, I create a dummy
qualifier with a dummy value to make sure the row key gets inserted as it
would in sql.
On Mon, Apr 1, 2013 at 4:49 PM, James Taylor <[email protected]>
wrote:
On 04/01/2013 04:41 PM, Nick Dimiduk wrote:
On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <[email protected]>
wrote:
From the SQL perspective, handling null is important.
From your perspective, it is critical to support NULLs, even at the
expense
of fixed-width encodings at all or supporting representation of a full
range of values. That is, you'd rather be able to represent NULL than
-2^31?
We've been able to get away with supporting NULL through the absence of
the value rather than restricting the data range. We haven't had any push
back on not allowing a fixed width nullable leading row key column. Since
our variable length DECIMAL supports null and is a superset of the fixed
width numeric types, users have a reasonable alternative.
I'd rather not restrict the range of values, since it doesn't seem like
this would be necessary.
On 04/01/2013 01:32 PM, Nick Dimiduk wrote:
Thanks for the thoughtful response (and code!).
I'm thinking I will press forward with a base implementation that does
not
support nulls. The idea is to provide an extensible set of interfaces,
so
I
think this will not box us into a corner later. That is, a mirroring
package could be implemented that supports null values and accepts
the relevant trade-offs.
Thanks,
Nick
On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[email protected]>
wrote:
I spent some time this weekend extracting bits of our serialization
code
to
a public github repo at http://github.com/hotpads/****data-tools<
http://github.com/hotpads/**data-tools>
<http://github.com/**hotpads/data-tools<
http://github.com/hotpads/data-tools>
.
Contributions are welcome - i'm sure we all have this stuff laying
around.
You can see I've bumped into the NULL problem in a few places:
*
https://github.com/hotpads/****data-tools/blob/master/src/**<
https://github.com/hotpads/**data-tools/blob/master/src/**>
main/java/com/hotpads/data/****primitive/lists/LongArrayList.****java<
https://github.com/**hotpads/data-tools/blob/**
master/src/main/java/com/**hotpads/data/primitive/lists/**
LongArrayList.java<
https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
*
https://github.com/hotpads/****data-tools/blob/master/src/**<
https://github.com/hotpads/**data-tools/blob/master/src/**>
main/java/com/hotpads/data/****types/floats/DoubleByteTool.****java<
https://github.com/**hotpads/data-tools/blob/**
master/src/main/java/com/**hotpads/data/types/floats/**
DoubleByteTool.java<
https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java
Looking back, I think my latest opinion on the topic is to reject
nullability as the rule since it can cause unexpected behavior and
confusion. It's cleaner to provide a wrapper class (so both
LongArrayList
plus NullableLongArrayList) that explicitly defines the behavior, and
costs
a little more in performance. If the user can't find a pre-made
wrapper
class, it's not very difficult for each user to provide their own
interpretation of null and check for it themselves.
If you reject nullability, the question becomes what to do in
situations
where you're implementing existing interfaces that accept nullable
params.
The LongArrayList above implements List<Long> which requires an
add(Long)
method. In the above implementation I chose to swap nulls with
Long.MIN_VALUE, however I'm now thinking it best to force the user to
make
that swap and then throw IllegalArgumentException if they pass null.
On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <
[email protected]
wrote:
HmmmŠ good question.
I think that fixed width support is important for a great many
rowkey
constructs cases, so I'd rather see something like losing MIN_VALUE
and
keeping fixed width.
On 4/1/13 2:00 PM, "Nick Dimiduk" <[email protected]> wrote:
Heya,
Thinking about data types and serialization. I think null support
is
an
important characteristic for the serialized representations,
especially
when considering the compound type. However, doing so in directly
incompatible with fixed-width representations for numerics. For
instance,
if we want to have a fixed-width signed long stored on 8-bytes,
where
do
you put null? float and double types can cheat a little by folding
negative
and positive NaN's into a single representation (this isn't
strictly
correct!), leaving a place to represent null. In the long example
case,
the
obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one.
This
will allocate an additional encoding which can be used for null. My
experience working with scientific data, however, makes me wince at
the
idea.
The variable-width encodings have it a little easier. There's
already
enough going on that it's simpler to make room.
Remember, the final goal is to support order-preserving
serialization.
This
imposes some limitations on our encoding strategies. For instance,
it's
not
enough to simply encode null, it really needs to be encoded as 0x00
so
as
to sort lexicographically earlier than any other value.
What do you think? Any ideas, experiences, etc?
Thanks,
Nick