[ 
https://issues.apache.org/jira/browse/DRILL-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995665#comment-14995665
 ] 

Hari Sekhon commented on DRILL-3524:
------------------------------------

I already addressed the schema issue in the original post.

I've also suggested workarounds such as scanning the first few docs and return 
whatever fields are found.

Right now I have to run a sample query which is much harder for the human eye 
to parse than having a describe function provide a condensed unique list of 
discovered fields. This function also provides Drill engineers to determine the 
best performance / schema completeness trade-off via testing as well as 
providing the user a more comprehensive schema description via keyword ALL 
(full scan).

Considering the current 'describe' command returns completely useless output, 
any of the suggestions in my original post would be better than the current 
implementation.

> Drill proper DESCRIBE support for MongoDB
> -----------------------------------------
>
>                 Key: DRILL-3524
>                 URL: https://issues.apache.org/jira/browse/DRILL-3524
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata, Storage - MongoDB
>    Affects Versions: 1.1.0
>            Reporter: Hari Sekhon
>             Fix For: Future
>
>
> Request to add full DESCRIBE support for MongoDB collections.
> I understand this may be difficult / sub-optimal due to the flexible schema 
> nature of Mongo docs but if you can tabulate results when reading directly 
> from MongoDB for which you have read the field names, then it's also possible 
> to extract all field names to present for the describe command, albeit an 
> inefficient scan to do so.
> Currently describe returns a pseudo / inaccurate / unhelpful metadata:
> {code}+--------------+------------+--------------+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--------------+------------+--------------+
> | *            | ANY        | YES          |
> +--------------+------------+--------------+{code}
> Perhaps you could extend DESCRIBE to scan the first few dozen docs by default 
> to create a merged schema as well as adding an optional argument to the 
> describe command to allow for scanning a user-specified number of docs from 
> which to describe the schema, or an ALL argument keyword to describe to scan 
> all docs in a collection to get the complete global schema for the collection?
> In case of schema evolution it might be an interesting option to additionally 
> read the newest and oldest records, maybe the first and last records by ID 
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to