[jira] [Updated] (DRILL-3526) Drill proper DESCRIBE support for JSON

Hari Sekhon (JIRA) Tue, 21 Jul 2015 06:15:38 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hari Sekhon updated DRILL-3526:
-------------------------------
    Description: 
Request to add full DESCRIBE support for JSON files.

Currently the describe command results in a blank table being printed instead 
of the schema, which is unhelpful, so I do a select * limit 1 instead.

While trying to describe lots of JSON data could be inefficient, I propose the 
following solution:

Read JSON records until a threshold of a few thousand JSON file records or few 
tens of thousands of fields have been read without discovering any new fields, 
and then assume that is the schema.

Extend the DESCRIBE command to have a user-configurable number of records / 
fields to read (or rather number of records / fields to read without which any 
new fields have been discovered) to present a merged schema for the data 
source, as well as an ALL keywords to scan all JSON files and records to create 
true global schema.

In case of schema evolution it might be a good idea to read the newest and 
oldest JSON files.

  was:
Request to add full DESCRIBE support for JSON files.

Currently the describe command results in a blank table being printed instead 
of the schema, which is unhelpful, so I do a select * limit 1 instead.

While trying to describe lots of JSON data could be inefficient, I propose the 
following solution:

Read JSON records until a threshold of a few thousand JSON file records or few 
tens of thousands of fields have been read without discovering any new fields, 
and then assume that is the schema.

Extend the DESCRIBE command to have a user-configurable number of records / 
fields to read (or rather number of records / fields to read without which any 
new fields have been discovered) to present a merged schema for the data 
source, as well as an ALL keywords to scan all JSON files and records to create 
true global schema.


> Drill proper DESCRIBE support for JSON
> --------------------------------------
>
>                 Key: DRILL-3526
>                 URL: https://issues.apache.org/jira/browse/DRILL-3526
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata, Storage - JSON
>    Affects Versions: 1.1.0
>            Reporter: Hari Sekhon
>            Assignee: Steven Phillips
>
> Request to add full DESCRIBE support for JSON files.
> Currently the describe command results in a blank table being printed instead 
> of the schema, which is unhelpful, so I do a select * limit 1 instead.
> While trying to describe lots of JSON data could be inefficient, I propose 
> the following solution:
> Read JSON records until a threshold of a few thousand JSON file records or 
> few tens of thousands of fields have been read without discovering any new 
> fields, and then assume that is the schema.
> Extend the DESCRIBE command to have a user-configurable number of records / 
> fields to read (or rather number of records / fields to read without which 
> any new fields have been discovered) to present a merged schema for the data 
> source, as well as an ALL keywords to scan all JSON files and records to 
> create true global schema.
> In case of schema evolution it might be a good idea to read the newest and 
> oldest JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3526) Drill proper DESCRIBE support for JSON

Reply via email to