bq. I create a dummy qualifier with a dummy value For any single application, the above can be done. For generic applications, how would we do this ?
Thanks On Mon, Apr 1, 2013 at 5:07 PM, Matt Corgan <mcor...@hotpads.com> wrote: > I generally don't allow nulls in my composite row keys. Does SQL allow > nulls in the PK? In the rare case I wanted to do that I might create a > separate format called NullableCInt32 with 5 bytes where the first one > determined null. It's important to keep the pure types pure. > > I have lots of null *values* however, but they're represented by lack of a > qualifier in the Put. If a row has all null values, I create a dummy > qualifier with a dummy value to make sure the row key gets inserted as it > would in sql. > > > On Mon, Apr 1, 2013 at 4:49 PM, James Taylor <jtay...@salesforce.com> > wrote: > > > On 04/01/2013 04:41 PM, Nick Dimiduk wrote: > > > >> On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <jtay...@salesforce.com> > >> wrote: > >> > >> From the SQL perspective, handling null is important. > >>> > >> > >> From your perspective, it is critical to support NULLs, even at the > >> expense > >> of fixed-width encodings at all or supporting representation of a full > >> range of values. That is, you'd rather be able to represent NULL than > >> -2^31? > >> > > We've been able to get away with supporting NULL through the absence of > > the value rather than restricting the data range. We haven't had any push > > back on not allowing a fixed width nullable leading row key column. Since > > our variable length DECIMAL supports null and is a superset of the fixed > > width numeric types, users have a reasonable alternative. > > > > I'd rather not restrict the range of values, since it doesn't seem like > > this would be necessary. > > > > > >> On 04/01/2013 01:32 PM, Nick Dimiduk wrote: > >> > >>> Thanks for the thoughtful response (and code!). > >>>> > >>>> I'm thinking I will press forward with a base implementation that does > >>>> not > >>>> support nulls. The idea is to provide an extensible set of interfaces, > >>>> so > >>>> I > >>>> think this will not box us into a corner later. That is, a mirroring > >>>> package could be implemented that supports null values and accepts > >>>> the relevant trade-offs. > >>>> > >>>> Thanks, > >>>> Nick > >>>> > >>>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <mcor...@hotpads.com> > >>>> wrote: > >>>> > >>>> I spent some time this weekend extracting bits of our serialization > >>>> code > >>>> > >>>>> to > >>>>> a public github repo at http://github.com/hotpads/****data-tools< > http://github.com/hotpads/**data-tools> > >>>>> <http://github.com/**hotpads/data-tools< > http://github.com/hotpads/data-tools> > >>>>> > > >>>>> . > >>>>> Contributions are welcome - i'm sure we all have this stuff laying > >>>>> around. > >>>>> > >>>>> You can see I've bumped into the NULL problem in a few places: > >>>>> * > >>>>> > >>>>> https://github.com/hotpads/****data-tools/blob/master/src/**< > https://github.com/hotpads/**data-tools/blob/master/src/**> > >>>>> > main/java/com/hotpads/data/****primitive/lists/LongArrayList.****java< > >>>>> https://github.com/**hotpads/data-tools/blob/** > >>>>> master/src/main/java/com/**hotpads/data/primitive/lists/** > >>>>> LongArrayList.java< > https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java > > > >>>>> > > >>>>> * > >>>>> > >>>>> https://github.com/hotpads/****data-tools/blob/master/src/**< > https://github.com/hotpads/**data-tools/blob/master/src/**> > >>>>> main/java/com/hotpads/data/****types/floats/DoubleByteTool.****java< > >>>>> https://github.com/**hotpads/data-tools/blob/** > >>>>> master/src/main/java/com/**hotpads/data/types/floats/** > >>>>> DoubleByteTool.java< > https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java > > > >>>>> > > >>>>> > >>>>> Looking back, I think my latest opinion on the topic is to reject > >>>>> nullability as the rule since it can cause unexpected behavior and > >>>>> confusion. It's cleaner to provide a wrapper class (so both > >>>>> LongArrayList > >>>>> plus NullableLongArrayList) that explicitly defines the behavior, and > >>>>> costs > >>>>> a little more in performance. If the user can't find a pre-made > >>>>> wrapper > >>>>> class, it's not very difficult for each user to provide their own > >>>>> interpretation of null and check for it themselves. > >>>>> > >>>>> If you reject nullability, the question becomes what to do in > >>>>> situations > >>>>> where you're implementing existing interfaces that accept nullable > >>>>> params. > >>>>> The LongArrayList above implements List<Long> which requires an > >>>>> add(Long) > >>>>> method. In the above implementation I chose to swap nulls with > >>>>> Long.MIN_VALUE, however I'm now thinking it best to force the user to > >>>>> make > >>>>> that swap and then throw IllegalArgumentException if they pass null. > >>>>> > >>>>> > >>>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil < > >>>>> doug.m...@explorysmedical.com > >>>>> > >>>>> wrote: > >>>>>> HmmmŠ good question. > >>>>>> > >>>>>> I think that fixed width support is important for a great many > rowkey > >>>>>> constructs cases, so I'd rather see something like losing MIN_VALUE > >>>>>> and > >>>>>> keeping fixed width. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <ndimi...@gmail.com> wrote: > >>>>>> > >>>>>> Heya, > >>>>>> > >>>>>>> Thinking about data types and serialization. I think null support > is > >>>>>>> an > >>>>>>> important characteristic for the serialized representations, > >>>>>>> especially > >>>>>>> when considering the compound type. However, doing so in directly > >>>>>>> incompatible with fixed-width representations for numerics. For > >>>>>>> > >>>>>>> instance, > >>>>>> if we want to have a fixed-width signed long stored on 8-bytes, > where > >>>>>> do > >>>>>> > >>>>>>> you put null? float and double types can cheat a little by folding > >>>>>>> negative > >>>>>>> and positive NaN's into a single representation (this isn't > strictly > >>>>>>> correct!), leaving a place to represent null. In the long example > >>>>>>> case, > >>>>>>> the > >>>>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. > >>>>>>> This > >>>>>>> will allocate an additional encoding which can be used for null. My > >>>>>>> experience working with scientific data, however, makes me wince at > >>>>>>> the > >>>>>>> idea. > >>>>>>> > >>>>>>> The variable-width encodings have it a little easier. There's > already > >>>>>>> enough going on that it's simpler to make room. > >>>>>>> > >>>>>>> Remember, the final goal is to support order-preserving > >>>>>>> serialization. > >>>>>>> This > >>>>>>> imposes some limitations on our encoding strategies. For instance, > >>>>>>> it's > >>>>>>> not > >>>>>>> enough to simply encode null, it really needs to be encoded as 0x00 > >>>>>>> so > >>>>>>> > >>>>>>> as > >>>>>> to sort lexicographically earlier than any other value. > >>>>>> > >>>>>>> What do you think? Any ideas, experiences, etc? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Nick > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > > >