[jira] [Commented] (DRILL-6035) Specify Drill's JSON behavior

Paul Rogers (JIRA) Thu, 28 Dec 2017 11:41:17 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305690#comment-16305690
 ]


Paul Rogers commented on DRILL-6035:
------------------------------------

h4. Drill’s Preferred JSON Format

By now it should be clear that JSON supports a huge variety of data formats, 
while Drill provides good support for one very specific format. Drill has 
challenges to the degree that the actual format deviates from Drill’s 
preference. (In this sense, Drill’s claim to be schema-free and based on 
arbitrary JSON is more of an aspiration than a reality.)

Drill’s preferred JSON format is:

* Data presented as a series of objects which correspond to Drill rows.
* Every object has the same set of name/value pairs which correspond to Drill 
columns.
* Within the top-level object, keys are column names, values are the (scalar) 
value of that column.
* Every field has a single, fixed type.
* Fields with floating point numbers always include a decimal point.
* Null density is low. Specifically, the first batch of every file contains an 
actual value for every field. (That is, no long runs of null or missing 
columns.)
* If nested objects appear (singly, or in lists) they follow the same rules as 
the top-level object, and directly represent application data. (That is, data 
is not encoded in any fancy format.)
* Only single-dimension lists are allowed. Preferably, only a single tree of 
lists (that can be expanded with the `flatten()` function.)

For example:

{code}
{ id: 101, name: “fred”, active: true, balance: 123.45,
  ship_address: {line1: “301 Cobblestone Way”, city: “Bedrock},
  bill_address: {line1: “345 Stonecave Road”, city: “Bedrock}
}
{code}

Drill works best when the JSON was created to comply with the above rules. If 
we run the rules in reverse, we get the format that Drill creates when doing a 
CTAS to JSON.

> Specify Drill's JSON behavior
> -----------------------------
>
>                 Key: DRILL-6035
>                 URL: https://issues.apache.org/jira/browse/DRILL-6035
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Assignee: Pritesh Maker
>
> Drill supports JSON as its native data format. However, experience suggests 
> that Drill may have limitations in the JSON that Drill supports. This ticket 
> asks to clarify Drill's expected behavior on various kinds of JSON.
> Topics to be addressed:
> * Relational vs. non-relational structures
> * JSON structures used in practice and how they map to Drill
> * Support for varying data types
> * Support for missing values, especially across files
> These topics are complex, hence the request to provide a detailed 
> specifications that clarifies what Drill does and does not support (or what 
> is should and should not support.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-6035) Specify Drill's JSON behavior

Reply via email to