[
https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293557#comment-16293557
]
Paul Rogers edited comment on DRILL-6035 at 12/16/17 4:07 AM:
--------------------------------------------------------------
h4. All-Text Mode
Drill provides the ability to read scalar values as text:
{code}
ALTER SESSION SET `store.json.all_text_mode` = true
{code}
In this mode, JSON scalars are read as follows:
|| JSON Type || As Member Value | As Array Value ||
| Missing | NULL (VARCHAR) | N/A |
| null | NULL (VARCHAR) | String value "null" |
| true/false | "true"/"false" | Same |
| Number | Number text | Same |
| String | The string value (without quotes) | Same |
All-text mode can overcome some schema change exceptions such as:
* Long string of missing or null values before the first non-null value.
* Different scalar types in different records.
* Hetrogeneous arrays.
* Arrays that contain nulls. (The null values are stored as empty strings.)
In Drill 1.13, in all-text mode, missing columns are presumed to be Nullable
VARCHAR. (Prior versions may have assumed Nullable INT.) As a result, if
file1.json has column `x`, but file2.json does not, then no schema change will
occur when combining the results since both files will assume that `x` is a
Nullable VARCHAR. (Note that this works only if the query explicitly projects
column `x`. It won't necessarily work for queries with the wildcard.)
Note that all-text mode cannot overcome schema changes due to mixes of scalar
and structured (object or list) types.
was (Author: paul.rogers):
h4. All-Text Mode
Drill provides the ability to read scalar values as text:
{code}
ALTER SESSION SET `store.json.all_text_mode` = true
{code}
In this mode, JSON scalars are read as follows:
|| JSON Type || As Text ||
| Missing | NULL (VARCHAR) |
| null | NULL (VARCHAR) |
| true/false | "true"/"false" |
| Number | Number text |
| String | The string value (without quotes) |
All-text mode can overcome some schema change exceptions such as:
* Long string of missing or null values before the first non-null value.
* Different scalar types in different records.
* Hetrogeneous arrays.
* Arrays that contain nulls. (The null values are stored as empty strings.)
In Drill 1.13, in all-text mode, missing columns are presumed to be Nullable
VARCHAR. (Prior versions may have assumed Nullable INT.) As a result, if
file1.json has column `x`, but file2.json does not, then no schema change will
occur when combining the results since both files will assume that `x` is a
Nullable VARCHAR. (Note that this works only if the query explicitly projects
column `x`. It won't necessarily work for queries with the wildcard.)
Note that all-text mode cannot overcome schema changes due to mixes of scalar
and structured (object or list) types.
> Specify Drill's JSON behavior
> -----------------------------
>
> Key: DRILL-6035
> URL: https://issues.apache.org/jira/browse/DRILL-6035
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.13.0
> Reporter: Paul Rogers
> Assignee: Pritesh Maker
>
> Drill supports JSON as its native data format. However, experience suggests
> that Drill may have limitations in the JSON that Drill supports. This ticket
> asks to clarify Drill's expected behavior on various kinds of JSON.
> Topics to be addressed:
> * Relational vs. non-relational structures
> * JSON structures used in practice and how they map to Drill
> * Support for varying data types
> * Support for missing values, especially across files
> These topics are complex, hence the request to provide a detailed
> specifications that clarifies what Drill does and does not support (or what
> is should and should not support.)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)