[ 
https://issues.apache.org/jira/browse/DRILL-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961322#comment-16961322
 ] 

Paul Rogers commented on DRILL-7426:
------------------------------------

[~cgivre], the query in question used the wildcard, which asks to read all 
columns. In general, the reader cannot predict the future: it cannot tell that 
`info` will contain mixed data.

However, Drill should work if the query were `SELECT name, response FROM ...`. 
If not, then that is a bug that is fixable.

The issue is that the user seems to need the data. One workaround is to rewrite 
the JSON so that the array is represented as an object:

{noformat}
{
    "name": "toto",
    "info": { command: "LOAD", values: [] },
    "response": 1
}
{noformat}

But, here we run into the empty-array issue: we don't know the type of the 
`values` array...

In general, JSON can represent a wider set of data structures than relational 
tuples. It has always been an open question the variety of such data that Drill 
should handle. I think most users end up running an ETL to convert the data 
into a relational format (then store the data in Parquet for better 
performance.) So, one could debate whether it is worth adding more complexity 
to Drill.

> Json support lists of different types
> -------------------------------------
>
>                 Key: DRILL-7426
>                 URL: https://issues.apache.org/jira/browse/DRILL-7426
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Documentation
>    Affects Versions: 1.16.0
>            Reporter: benj
>            Priority: Trivial
>
> With a file.json like
> {code:json}
> {
>     "name": "toto",
>     "info": [["LOAD", []]],
>     "response": 1
> }
> {code}
> A simple SELECT gives an error
> {code:sql}
> apache drill> SELECT * FROM dfs.test.`file.json`;
> Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
> value of type LIST. Drill does not support lists of different types.
> {code}
> But there is an option _exec.enable_union_type_ that allows these request
> {code:sql}
> apache drill> ALTER SESSION SET `exec.enable_union_type` = true;
> apache drill> SELECT * FROM dfs.test.`file.json`;
> +------+---------------+----------+
> | name |     info      | response |
> +------+---------------+----------+
> | toto | [["LOAD",[]]] | 1        |
> +------+---------------+----------+
> 1 row selected (0.283 seconds)
> {code}
> The usage of this option is not evident. So, it will be useful to mention 
> after the error message the possibility to set it.
> {noformat}
> Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
> value of type LIST. Drill does not support lists of different types. .... SET 
> the option 'exec.enable_union_type' to true and try again;
> {noformat}
> This behaviour is used for other error, example:
> {noformat}
> ...
> Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due 
> to either a cartesian join or an inequality join. 
> If a cartesian or inequality join is used intentionally, set the option 
> 'planner.enable_nljoin_for_scalar_only' to false and try again.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to