Paul Rogers created DRILL-6359:
----------------------------------

             Summary: All-text mode in JSON still reads missing column as 
Nullable Int
                 Key: DRILL-6359
                 URL: https://issues.apache.org/jira/browse/DRILL-6359
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.13.0
            Reporter: Paul Rogers


Suppose we have the following file:

{noformat}
{a: 0}
{a: 1}
...
{a: 70001, b: 10.5}
{noformat}

Where the "..." indicates another 70K records. (Chosen to force the appearance 
of {{b}} into a second or later batch.)

Suppose we execute the following query:

{code}
ALTER SESSION SET `store.json.all_text_mode` = true;
SELECT a, b FROM `70Kmissing.json` WHERE b IS NOT NULL ORDER BY a;
{code}

The query should work. We have an explicit project for column {{b}} and we've 
told JSON to always use text. So, JSON should have enough information to create 
column {{b}} as {{Nullable VarChar}}.

Yet, the result of the query in {{sqlline}} is:

{noformat}
Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External 
Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(INT:OPTIONAL)]], selectionVector=NONE]
Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(VARCHAR:OPTIONAL)]], selectionVector=NONE]
{noformat}

The expected result is that the query works because even missing columns should 
be subject to the "all text mode" setting because the JSON reader handles 
projection push-down, and is responsible for filling in the missing columns.

This is with the shipping Drill 1.13 JSON reader. I *think* this is fixed in 
the "batch size handling" JSON reader rewrite, but I've not tested it.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to