Hi Till,

As I was thinking through I have also realized those two separate issues. For 
now I am going to concentrate on 1) as quick solution to the existing bugs, as 
you have pointed out.

From the design perspective I was thinking to reuse the serialization for 
homogenous lists with nullable types could be the same as for heterogeneous 
list, but instead of ANY it will have a nullable type tag. 
Yes, this will require a separate type tag for nullable type (we do have one 
already - ATypeTag.Union, but do do provide serve for it). 
Current homogenous list representation should be unaffected.
Not sure what did you meant by redefining current representation.

> On Dec 18, 2015, at 01:57, Till Westmann <[email protected]> wrote:
> 
> Hi Ildar,
> 
> it seems that we have 2 separate points here:
> 1) There are bugs in the way we decide which list representation to use and
> 2) we could add support for (and an optimized representation for) a list of a 
> fixed but nullable type.
> It seems that - by fixing 1) - we could get rid of the issues you’ve listed.
> 
> But I also think that it would be nice to support lists of a nullable type 
> (feels like an omission that we don’t support that in the language) - and 
> potentially provide an efficient representation for them.
> However, it is not clear to me how we would do this.
> A few thoughts:
> - Would we maintain the current representation for homogenous lists of 
> non-nullable types?
> - Would we introduce a new type tag for “nullable lists”?
> - Would we redefine the current representation to mean something else?
> Do you have thoughts on those?
> 
> Cheers,
> Till
> 
> On 16 Dec 2015, at 8:12, Ildar Absalyamov wrote:
> 
>> Hi devs,
>> 
>> Recently I have been playing around with lists and functions, which 
>> receive/return list parameters/values. I have noticed one particular issue, 
>> which seems to be incorrect.
>> As you might know internally we do support 2 types of lists homogeneous, 
>> where all the items are untagged and the item type is stored in type 
>> definition, and heterogeneous, where items on contrary are tagged, and the 
>> list item type is effectively ANY.
>> The decision which of two types would be used is usually done by parser or 
>> is altered by IntroduceEnforcedListTypeRule, which effectively turns 
>> heterogenous list into homogenous if all the items have the same type.
>> Right now only we allow homogeneous lists to be defined as a field in some 
>> type, we also restrict the item type to be only non-nullable type:
>> create type listType {
>> “id”:int64,
>> “list”:[int64]   // [int64?] is not possible
>> }
>> 
>> This constraint spans both of the language level as well as serialization. 
>> Under that restriction the only way to load the list, which contains null 
>> values, would be to make the appropriate field open (open lists are 
>> heterogenous by definition).
>> 
>> 1) Seems like we’re missing an optimization opportunity when we are dealing 
>> with large sparse lists. Serialization in this case might use a bit mask to 
>> specify which items in the lists are not null, and later encode only those 
>> items.
>> 2) I believe if we alter IntroduceEnforcedListTypeRule to enforce list to 
>> homogeneous list with nullable item type we might resolve issues 
>> https://issues.apache.org/jira/browse/ASTERIXDB-905, 
>> https://issues.apache.org/jira/browse/ASTERIXDB-867, 
>> https://issues.apache.org/jira/browse/ASTERIXDB-1131all at once.
>> 
>> Thoughts?
>> 
>> Best regards,
>> Ildar

Best regards,
Ildar

Reply via email to