Hi devs,

Recently I have been playing around with lists and functions, which 
receive/return list parameters/values. I have noticed one particular issue, 
which seems to be incorrect.
As you might know internally we do support 2 types of lists homogeneous, where 
all the items are untagged and the item type is stored in type definition, and 
heterogeneous, where items on contrary are tagged, and the list item type is 
effectively ANY.
The decision which of two types would be used is usually done by parser or is 
altered by IntroduceEnforcedListTypeRule, which effectively turns heterogenous 
list into homogenous if all the items have the same type.
Right now only we allow homogeneous lists to be defined as a field in some 
type, we also restrict the item type to be only non-nullable type:
create type listType {
  “id”:int64,
  “list”:[int64]   // [int64?] is not possible
}

This constraint spans both of the language level as well as serialization. 
Under that restriction the only way to load the list, which contains null 
values, would be to make the appropriate field open (open lists are 
heterogenous by definition).

1) Seems like we’re missing an optimization opportunity when we are dealing 
with large sparse lists. Serialization in this case might use a bit mask to 
specify which items in the lists are not null, and later encode only those 
items.
2) I believe if we alter IntroduceEnforcedListTypeRule to enforce list to 
homogeneous list with nullable item type we might resolve issues 
https://issues.apache.org/jira/browse/ASTERIXDB-905, 
https://issues.apache.org/jira/browse/ASTERIXDB-867, 
https://issues.apache.org/jira/browse/ASTERIXDB-1131all at once.

Thoughts?

Best regards,
Ildar

Reply via email to