[jira] [Commented] (DRILL-6035) Specify Drill's JSON behavior

Paul Rogers (JIRA) Thu, 28 Dec 2017 11:38:15 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305685#comment-16305685
 ]


Paul Rogers commented on DRILL-6035:
------------------------------------

h4. JSON as Drill’s Reference Data Model

As [described on the video on the Apace Drill home 
page|http://drill.apache.org], Drill takes JSON as its primary data model since 
it is a superset of the relational model, Parquet, AVRO and other input formats.

Drill is a “schema-free” query engine because it is based on the schema-free 
data model. Yet, Drill is based on the relational model which very much 
requires a schema.

The challenge, then, is how Drill represents a universal, non-relational data 
model within a relational implementation. This is not a trivial question. In 
fact, there is no good answer. (Many projects faced the same issue with XML; 
few invented good solutions.)

At present, the concept of using JSON as the reference data model for a 
relational engine is more of an aspiration than a working reality. Drill has no 
specification for the theory (or rules or implementation) for how Drill maps 
from JSON to relations (that is, to value vectors.) Instead, each data source 
works out an implementation as best it can. This leaves the holes that we 
explore here.

The fundamental problem is that JSON is universal: all structures are legal. 
Relational theory is based on tables (or, with extensions, to a set of nested 
tables.) [SQL++|https://arxiv.org/abs/1405.3631] is one attempt to extend SQL 
to “semi-structured” data:

bq. The SQL++ semi-structured data model is a superset of both JSON and the SQL 
data model. SQL++ of- fers powerful computational capabilities for processing 
semi- structured data akin to prior non-relational query languages, notably OQL 
and XQuery.

Our goal here is not to debate the merits of one system vs. another. Rather, we 
simply wish to note that standard JSON is a superset of standard SQL and that 
importing JSON into Drill is therefore not a trivial exercise.

> Specify Drill's JSON behavior
> -----------------------------
>
>                 Key: DRILL-6035
>                 URL: https://issues.apache.org/jira/browse/DRILL-6035
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Assignee: Pritesh Maker
>
> Drill supports JSON as its native data format. However, experience suggests 
> that Drill may have limitations in the JSON that Drill supports. This ticket 
> asks to clarify Drill's expected behavior on various kinds of JSON.
> Topics to be addressed:
> * Relational vs. non-relational structures
> * JSON structures used in practice and how they map to Drill
> * Support for varying data types
> * Support for missing values, especially across files
> These topics are complex, hence the request to provide a detailed 
> specifications that clarifies what Drill does and does not support (or what 
> is should and should not support.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-6035) Specify Drill's JSON behavior

Reply via email to