[
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956645#comment-15956645
]
Arina Ielchiieva commented on DRILL-3562:
-----------------------------------------
I see, one more point then. DRILL-3562 made changes in JsonReader and
FlattenRecordBatch classes. If there are no empty arrays in json files, only
changes in FlattenRecordBatch may have had influence.
Error message from DRILL-5399 "Flatten does not support inputs of non-list
values." may be thrown in the two places:
1. In
https://github.com/apache/drill/blob/ddcf89548bd33c0cd3e062f1f6d5027eed822372/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java#L282
but it is before code changes in DRILL-3562.
2. In
https://github.com/apache/drill/blob/ddcf89548bd33c0cd3e062f1f6d5027eed822372/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java#L139
but this one is connected with value vector which is taken from the incoming
batch but not from the FlattenRecordBatch where changes were made.
Also there is [a
check|https://github.com/apache/drill/blob/ddcf89548bd33c0cd3e062f1f6d5027eed822372/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java#L321]
in FlattenRecordBatch which won't pass data from queries in DRILL-5399 to
changes made in DRILL-3562. So far I don't see any relation between DRILL-3652
and DRILL-5399.
> Query fails when using flatten on JSON data where some documents have an
> empty array
> ------------------------------------------------------------------------------------
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - JSON
> Affects Versions: 1.1.0
> Reporter: Philip Deegan
> Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t)
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on
> { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast
> org.apache.drill.exec.vector.NullableIntVector to
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated
> with dummy data?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)