[
https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293265#comment-16293265
]
Paul Rogers edited comment on DRILL-6035 at 12/15/17 9:46 PM:
--------------------------------------------------------------
Drill uses simple rules to infer scalar data types from JSON input:
* If the value is quoted, it is a Nullable {{VARCHAR}}. Example: {{a:
"varchar"}}
* If the value is numeric, and has no decimal point, it is a Nullable
{{BIGINT}}. Example: {{a: 10}}
* If the value is numeric, and has a decimal point, it is a Nullable
{{FLOAT8}}. Eample: {{a: 10.5}}
* If the value is {{true}} or {{false}}, it is a Nullable {{BIT}}, stored as
Nullable {{UINT1}}, with {{true}} stored as 1, {{false}} as 0. Example: {{a:
true}}
A special case occurs if the user enables {{store.json.all_text_mode}}. In this
case, all cases above give rise to a nullable {{VARCHAR}} column type.
h4. Nulls with Scalar Values
The rules for {{null}} are a bit more complex. The rules below apply to the
Drill 1.13 JSON reader, earlier versions require investigation.
* If the {{null}} occurs after seeing one of the above types, then the value is
a {{NULL}} of that type (such as a {{NULL}} {{VARCHAR}}.)
* If the {{null}} occurs at the start of the file, Drill defers identifying the
data type until it sees the first non-null value.
* If the entire file (or first batch of a file: 4K records before Drill 1.13,
variable number in Drill 1.13 or later) contains only {{null}} values, the
column is defined as nullable {{VARCHAR}}. (Drill 1.12 and earlier likely has
different behavior.)
For example, in the following example, Drill 1.13 identifies the type as
nullable {{BIGINT}}:
{code}
{a: null} {a: null} {a: null} {a: 10}
{code}
In the following example, Drill must guess that the data type is nullable
{{VARCHAR}}:
{code}
{a: null} {a: null} {a: null}
{code}
was (Author: paul.rogers):
Drill uses simple rules to infer scalar data types from JSON input:
* If the value is quoted, it is a Nullable {{VARCHAR}}. Example: {{a:
"varchar"}}
* If the value is numeric, and has no decimal point, it is a Nullable
{{BIGINT}}. Example: {{a: 10}}
* If the value is numeric, and has a decimal point, it is a Nullable
{{FLOAT8}}. Eample: {{a: 10.5}}
* If the value is {{true}} or {{false}}, it is a Nullable {{BIT}}, stored as
Nullable {{UINT1}}, with {{true}} stored as 1, {{false}} as 0. Example: {{a:
true}}
A special case occurs if the user enables {{store.json.all_text_mode}}. In this
case, all cases above give rise to a nullable {{VARCHAR}} column type.
The rules for {{null}} are a bit more complex. The rules below apply to the
Drill 1.13 JSON reader, earlier versions require investigation.
* If the {{null}} occurs after seeing one of the above types, then the value is
a {{NULL}} of that type (such as a {{NULL}} {{VARCHAR}}.)
* If the {{null}} occurs at the start of the file, Drill defers identifying the
data type until it sees the first non-null value.
* If the entire file (or first batch of a file: 4K records before Drill 1.13,
variable number in Drill 1.13 or later) contains only {{null}} values, the
column is defined as nullable {{VARCHAR}}. (Drill 1.12 and earlier likely has
different behavior.)
For example, in the following example, Drill 1.13 identifies the type as
nullable {{BIGINT}}:
{code}
{a: null} {a: null} {a: null} {a: 10}
{code}
In the following example, Drill must guess that the data type is nullable
{{VARCHAR}}:
{code}
{a: null} {a: null} {a: null}
{code}
> Specify Drill's JSON behavior
> -----------------------------
>
> Key: DRILL-6035
> URL: https://issues.apache.org/jira/browse/DRILL-6035
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.13.0
> Reporter: Paul Rogers
> Assignee: Pritesh Maker
>
> Drill supports JSON as its native data format. However, experience suggests
> that Drill may have limitations in the JSON that Drill supports. This ticket
> asks to clarify Drill's expected behavior on various kinds of JSON.
> Topics to be addressed:
> * Relational vs. non-relational structures
> * JSON structures used in practice and how they map to Drill
> * Support for varying data types
> * Support for missing values, especially across files
> These topics are complex, hence the request to provide a detailed
> specifications that clarifies what Drill does and does not support (or what
> is should and should not support.)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)