[ 
https://issues.apache.org/jira/browse/DRILL-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635069#comment-14635069
 ] 

Neeraja commented on DRILL-3526:
--------------------------------

This is certainly a useful feature.
In terms of detecting schema, one option for solution could be have a command 
(it could be a modified version of describe itself) to print ALL the different 
schemas available in JSON. Given JSON has flexibility to support new fields 
appearing in every record and fields changing types between records, see what 
are the various schemas available could be a great start for data exploration. 
It of course needed to complemented with the sampling based solution suggested 
because BI/SQL tools needs a single schema.

> Drill proper DESCRIBE support for JSON
> --------------------------------------
>
>                 Key: DRILL-3526
>                 URL: https://issues.apache.org/jira/browse/DRILL-3526
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata, Storage - JSON
>    Affects Versions: 1.1.0
>            Reporter: Hari Sekhon
>            Assignee: Steven Phillips
>
> Request to add full DESCRIBE support for JSON files.
> Currently the describe command results in a blank table being printed instead 
> of the schema, which is unhelpful, so I do a select * limit 1 instead.
> While trying to describe lots of JSON data could be inefficient, I propose 
> the following solution:
> Read JSON records until a threshold of a few thousand JSON file records or 
> few tens of thousands of fields have been read without discovering any new 
> fields, and then assume that is the schema.
> Extend the DESCRIBE command to have a user-configurable number of records / 
> fields to read (or rather number of records / fields to read without which 
> any new fields have been discovered) to present a merged schema for the data 
> source, as well as an ALL keywords to scan all JSON files and records to 
> create true global schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to