On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <jtay...@salesforce.com> wrote:
> From the SQL perspective, handling null is important. >From your perspective, it is critical to support NULLs, even at the expense of fixed-width encodings at all or supporting representation of a full range of values. That is, you'd rather be able to represent NULL than -2^31? On 04/01/2013 01:32 PM, Nick Dimiduk wrote: > >> Thanks for the thoughtful response (and code!). >> >> I'm thinking I will press forward with a base implementation that does not >> support nulls. The idea is to provide an extensible set of interfaces, so >> I >> think this will not box us into a corner later. That is, a mirroring >> package could be implemented that supports null values and accepts >> the relevant trade-offs. >> >> Thanks, >> Nick >> >> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <mcor...@hotpads.com> wrote: >> >> I spent some time this weekend extracting bits of our serialization code >>> to >>> a public github repo at >>> http://github.com/hotpads/**data-tools<http://github.com/hotpads/data-tools> >>> . >>> Contributions are welcome - i'm sure we all have this stuff laying >>> around. >>> >>> You can see I've bumped into the NULL problem in a few places: >>> * >>> >>> https://github.com/hotpads/**data-tools/blob/master/src/** >>> main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java> >>> * >>> >>> https://github.com/hotpads/**data-tools/blob/master/src/** >>> main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java> >>> >>> Looking back, I think my latest opinion on the topic is to reject >>> nullability as the rule since it can cause unexpected behavior and >>> confusion. It's cleaner to provide a wrapper class (so both >>> LongArrayList >>> plus NullableLongArrayList) that explicitly defines the behavior, and >>> costs >>> a little more in performance. If the user can't find a pre-made wrapper >>> class, it's not very difficult for each user to provide their own >>> interpretation of null and check for it themselves. >>> >>> If you reject nullability, the question becomes what to do in situations >>> where you're implementing existing interfaces that accept nullable >>> params. >>> The LongArrayList above implements List<Long> which requires an >>> add(Long) >>> method. In the above implementation I chose to swap nulls with >>> Long.MIN_VALUE, however I'm now thinking it best to force the user to >>> make >>> that swap and then throw IllegalArgumentException if they pass null. >>> >>> >>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil < >>> doug.m...@explorysmedical.com >>> >>>> wrote: >>>> HmmmŠ good question. >>>> >>>> I think that fixed width support is important for a great many rowkey >>>> constructs cases, so I'd rather see something like losing MIN_VALUE and >>>> keeping fixed width. >>>> >>>> >>>> >>>> >>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <ndimi...@gmail.com> wrote: >>>> >>>> Heya, >>>>> >>>>> Thinking about data types and serialization. I think null support is an >>>>> important characteristic for the serialized representations, especially >>>>> when considering the compound type. However, doing so in directly >>>>> incompatible with fixed-width representations for numerics. For >>>>> >>>> instance, >>> >>>> if we want to have a fixed-width signed long stored on 8-bytes, where do >>>>> you put null? float and double types can cheat a little by folding >>>>> negative >>>>> and positive NaN's into a single representation (this isn't strictly >>>>> correct!), leaving a place to represent null. In the long example case, >>>>> the >>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. >>>>> This >>>>> will allocate an additional encoding which can be used for null. My >>>>> experience working with scientific data, however, makes me wince at the >>>>> idea. >>>>> >>>>> The variable-width encodings have it a little easier. There's already >>>>> enough going on that it's simpler to make room. >>>>> >>>>> Remember, the final goal is to support order-preserving serialization. >>>>> This >>>>> imposes some limitations on our encoding strategies. For instance, it's >>>>> not >>>>> enough to simply encode null, it really needs to be encoded as 0x00 so >>>>> >>>> as >>> >>>> to sort lexicographically earlier than any other value. >>>>> >>>>> What do you think? Any ideas, experiences, etc? >>>>> >>>>> Thanks, >>>>> Nick >>>>> >>>> >>>> >>>> >>>> >