[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785593#comment-15785593
 ] 

ASF GitHub Bot commented on DRILL-3562:
---------------------------------------

GitHub user Serhii-Harnyk opened a pull request:

    https://github.com/apache/drill/pull/713

    DRILL-3562: Query fails when using flatten on JSON data where some do…

    …cuments have an empty array
    1. Added set for ListWriters tracking to keep empty arrays for further 
initializing in ensureAtLeastOneField method. 
    2. Added check to avoid schema generating with field type "Late" and mode 
"Optional", replaced it to "Int" type in FlattenRecordBatch class.
    3. Added unit tests to cover cases querying Json with empty arrays with 
flatten.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Serhii-Harnyk/drill DRILL-3562

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #713
    
----
commit 815de0cc6f16b247b3a655007241a074e38394c7
Author: Serhii-Harnyk <[email protected]>
Date:   2016-12-20T16:55:41Z

    DRILL-3562: Query fails when using flatten on JSON data where some 
documents have an empty array

----


> Query fails when using flatten on JSON data where some documents have an 
> empty array
> ------------------------------------------------------------------------------------
>
>                 Key: DRILL-3562
>                 URL: https://issues.apache.org/jira/browse/DRILL-3562
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.1.0
>            Reporter: Philip Deegan
>            Assignee: Serhii Harnyk
>             Fix For: Future
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to