[ 
https://issues.apache.org/jira/browse/DRILL-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908533#comment-14908533
 ] 

Jason Altekruse commented on DRILL-3831:
----------------------------------------

I think that this might not be the correct issue for this problem. I think the 
issue that is discussed in DRILL-2796 is actually related to untyped nulls. 
While we will want to support untyped nulls in a list (such as the JSON below), 
this issue is primarily concerned with allowing nulls in lists where the type 
is known. Another issue has been opened recently to allow for general untyped 
nulls, it is DRILL-3806, these two units of work will have to be combined to 
allow for untyped nulls in lists, but I don't know if we are actually using the 
Drill repeated type to represent the members of the IN list. If that were the 
case than the combination of these two JIRAs would be needed to solve the 
problem.

> Allow null values in lists
> --------------------------
>
>                 Key: DRILL-3831
>                 URL: https://issues.apache.org/jira/browse/DRILL-3831
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Data Types
>            Reporter: Jason Altekruse
>            Assignee: Jason Altekruse
>             Fix For: 1.3.0
>
>
> Drill currently fails to read a json file where a list has a value of null in 
> it. We have a workaround with all_text_mode for this case, but we need to 
> enhance Drill to support this concept in the core ValueVector data structure 
> used to represent records.
> As part of this change, I am considering removing the concept of a list that 
> requires all of its members to be non-null, effectively the only type of list 
> we have today. The data that can be read today would simply be read into a 
> list where the members could be nullable, but they all happen to be non-null. 
> This would simplify the code to prevent the need to cover the null and 
> non-null cases explicitly.
> Initially this could pose a risk with a minor performance hit, but overall 
> our approach with complex data is not been heavily performance tested. 
> Keeping the code simple for now will at least allow for more thorough testing 
> of the smaller number of cases, and hopefully make it easier to reason about 
> and improve as we evaluate the performance of Drill with complex data more 
> thoroughly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to