[ 
https://issues.apache.org/jira/browse/DRILL-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457919#comment-16457919
 ] 

Paul Rogers commented on DRILL-6359:
------------------------------------

Using the new functions from DRILL-6361, we can very clearly see the problem in 
{{sqline}}:

{noformat}
ALTER SESSION SET `store.json.all_text_mode` = true;
SELECT a, b,
    sqlTypeOf(b) AS b_type, modeof(b) AS b_mode
FROM `gen/70kmissing.json`
WHERE mod(a, 70000) = 1;
+--------+-------+--------------------+-----------+
|   a    |   b   |       b_type       |  b_mode   |
+--------+-------+--------------------+-----------+
| 1      | null  | INTEGER            | NULLABLE  |
| 70001  | 10.5  | CHARACTER VARYING  | NULLABLE  |
+--------+-------+--------------------+-----------+
{noformat}

> All-text mode in JSON still reads missing column as Nullable Int
> ----------------------------------------------------------------
>
>                 Key: DRILL-6359
>                 URL: https://issues.apache.org/jira/browse/DRILL-6359
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0, 1.14.0
>            Reporter: Paul Rogers
>            Priority: Major
>
> Suppose we have the following file:
> {noformat}
> {a: 0}
> {a: 1}
> ...
> {a: 70001, b: 10.5}
> {noformat}
> Where the "..." indicates another 70K records. (Chosen to force the 
> appearance of {{b}} into a second or later batch.)
> Suppose we execute the following query:
> {code}
> ALTER SESSION SET `store.json.all_text_mode` = true;
> SELECT a, b FROM `70Kmissing.json` WHERE b IS NOT NULL ORDER BY a;
> {code}
> The query should work. We have an explicit project for column {{b}} and we've 
> told JSON to always use text. So, JSON should have enough information to 
> create column {{b}} as {{Nullable VarChar}}.
> Yet, the result of the query in {{sqlline}} is:
> {noformat}
> Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External 
> Sort. Please enable Union type.
> Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
> (INT:OPTIONAL)]], selectionVector=NONE]
> Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
> (VARCHAR:OPTIONAL)]], selectionVector=NONE]
> {noformat}
> The expected result is that the query works because even missing columns 
> should be subject to the "all text mode" setting because the JSON reader 
> handles projection push-down, and is responsible for filling in the missing 
> columns.
> This is with the shipping Drill 1.13 JSON reader. I *think* this is fixed in 
> the "batch size handling" JSON reader rewrite, but I've not tested it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to