I have a small thought on that one ... would 0=null for numerical sparse list? or would better to extend the complex types to have "vectors and matrices" ?
On Fri, Dec 18, 2015 at 5:45 PM, Mike Carey <[email protected]> wrote: > Agreed. We probably need a mini design doc here. The short term urgency > seems to be a need to represent lists that can include nulls, as this is > blocking JPL and is also something easily produced by queries (AQL or > SQL++). Longer term one can imagine where this would be something that > might vary (at the lowest level of detail) by list, e.g., you might > represent dense and sparse lists quite differently, you might use > compression for certain kinds of lists, etc. > > > On 12/18/15 1:57 AM, Till Westmann wrote: > >> Hi Ildar, >> >> it seems that we have 2 separate points here: >> 1) There are bugs in the way we decide which list representation to use >> and >> 2) we could add support for (and an optimized representation for) a list >> of a fixed but nullable type. >> It seems that - by fixing 1) - we could get rid of the issues you’ve >> listed. >> >> But I also think that it would be nice to support lists of a nullable >> type (feels like an omission that we don’t support that in the language) - >> and potentially provide an efficient representation for them. >> However, it is not clear to me how we would do this. >> A few thoughts: >> - Would we maintain the current representation for homogenous lists of >> non-nullable types? >> - Would we introduce a new type tag for “nullable lists”? >> - Would we redefine the current representation to mean something else? >> Do you have thoughts on those? >> >> Cheers, >> Till >> >> On 16 Dec 2015, at 8:12, Ildar Absalyamov wrote: >> >> Hi devs, >>> >>> Recently I have been playing around with lists and functions, which >>> receive/return list parameters/values. I have noticed one particular issue, >>> which seems to be incorrect. >>> As you might know internally we do support 2 types of lists homogeneous, >>> where all the items are untagged and the item type is stored in type >>> definition, and heterogeneous, where items on contrary are tagged, and the >>> list item type is effectively ANY. >>> The decision which of two types would be used is usually done by parser >>> or is altered by IntroduceEnforcedListTypeRule, which effectively turns >>> heterogenous list into homogenous if all the items have the same type. >>> Right now only we allow homogeneous lists to be defined as a field in >>> some type, we also restrict the item type to be only non-nullable type: >>> create type listType { >>> “id”:int64, >>> “list”:[int64] // [int64?] is not possible >>> } >>> >>> This constraint spans both of the language level as well as >>> serialization. Under that restriction the only way to load the list, which >>> contains null values, would be to make the appropriate field open (open >>> lists are heterogenous by definition). >>> >>> 1) Seems like we’re missing an optimization opportunity when we are >>> dealing with large sparse lists. Serialization in this case might use a bit >>> mask to specify which items in the lists are not null, and later encode >>> only those items. >>> 2) I believe if we alter IntroduceEnforcedListTypeRule to enforce list >>> to homogeneous list with nullable item type we might resolve issues >>> https://issues.apache.org/jira/browse/ASTERIXDB-905, >>> https://issues.apache.org/jira/browse/ASTERIXDB-867, >>> https://issues.apache.org/jira/browse/ASTERIXDB-1131all at once. >>> >>> Thoughts? >>> >>> Best regards, >>> Ildar >>> >> > -- *Regards,* Wail Alkowaileet
