Hi Till, As I was thinking through I have also realized those two separate issues. For now I am going to concentrate on 1) as quick solution to the existing bugs, as you have pointed out.
From the design perspective I was thinking to reuse the serialization for homogenous lists with nullable types could be the same as for heterogeneous list, but instead of ANY it will have a nullable type tag. Yes, this will require a separate type tag for nullable type (we do have one already - ATypeTag.Union, but do do provide serve for it). Current homogenous list representation should be unaffected. Not sure what did you meant by redefining current representation. > On Dec 18, 2015, at 01:57, Till Westmann <[email protected]> wrote: > > Hi Ildar, > > it seems that we have 2 separate points here: > 1) There are bugs in the way we decide which list representation to use and > 2) we could add support for (and an optimized representation for) a list of a > fixed but nullable type. > It seems that - by fixing 1) - we could get rid of the issues you’ve listed. > > But I also think that it would be nice to support lists of a nullable type > (feels like an omission that we don’t support that in the language) - and > potentially provide an efficient representation for them. > However, it is not clear to me how we would do this. > A few thoughts: > - Would we maintain the current representation for homogenous lists of > non-nullable types? > - Would we introduce a new type tag for “nullable lists”? > - Would we redefine the current representation to mean something else? > Do you have thoughts on those? > > Cheers, > Till > > On 16 Dec 2015, at 8:12, Ildar Absalyamov wrote: > >> Hi devs, >> >> Recently I have been playing around with lists and functions, which >> receive/return list parameters/values. I have noticed one particular issue, >> which seems to be incorrect. >> As you might know internally we do support 2 types of lists homogeneous, >> where all the items are untagged and the item type is stored in type >> definition, and heterogeneous, where items on contrary are tagged, and the >> list item type is effectively ANY. >> The decision which of two types would be used is usually done by parser or >> is altered by IntroduceEnforcedListTypeRule, which effectively turns >> heterogenous list into homogenous if all the items have the same type. >> Right now only we allow homogeneous lists to be defined as a field in some >> type, we also restrict the item type to be only non-nullable type: >> create type listType { >> “id”:int64, >> “list”:[int64] // [int64?] is not possible >> } >> >> This constraint spans both of the language level as well as serialization. >> Under that restriction the only way to load the list, which contains null >> values, would be to make the appropriate field open (open lists are >> heterogenous by definition). >> >> 1) Seems like we’re missing an optimization opportunity when we are dealing >> with large sparse lists. Serialization in this case might use a bit mask to >> specify which items in the lists are not null, and later encode only those >> items. >> 2) I believe if we alter IntroduceEnforcedListTypeRule to enforce list to >> homogeneous list with nullable item type we might resolve issues >> https://issues.apache.org/jira/browse/ASTERIXDB-905, >> https://issues.apache.org/jira/browse/ASTERIXDB-867, >> https://issues.apache.org/jira/browse/ASTERIXDB-1131all at once. >> >> Thoughts? >> >> Best regards, >> Ildar Best regards, Ildar
