[jira] [Updated] (DRILL-3525) Drill proper DESCRIBE support for Parquet

Hari Sekhon (JIRA) Tue, 21 Jul 2015 06:13:21 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hari Sekhon updated DRILL-3525:
-------------------------------
    Description: 
Request to add full DESCRIBE support for Parquet.

Currently the describe command results in a blank table being printed instead 
of the schema, which is unhelpful, so I do a select * limit 1 instead.

While trying to describe lots of Parquet data could be inefficient, I propose 
the following solution:

Read the first parquet file and assume that is the schema. Extend the DESCRIBE 
command to have a user-configurable number of parquet files to read to present 
a merged schema for the data source, as well as an ALL keywords to scan all 
parquet files to create true global schema.

In case of schema evolution you could try reading the newest and oldest parquet 
files.

  was:
Request to add full DESCRIBE support for Parquet.

Currently the describe command results in a blank table being printed instead 
of the schema, which is unhelpful, so I do a select * limit 1 instead.

While trying to describe lots of Parquet data could be inefficient, I propose 
the following solution:

Read the first parquet file and assume that is the schema. Extend the DESCRIBE 
command to have a user-configurable number of parquet files to read to present 
a merged schema for the data source, as well as an ALL keywords to scan all 
parquet files to create true global schema.


> Drill proper DESCRIBE support for Parquet
> -----------------------------------------
>
>                 Key: DRILL-3525
>                 URL: https://issues.apache.org/jira/browse/DRILL-3525
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata, Storage - Parquet
>    Affects Versions: 1.1.0
>            Reporter: Hari Sekhon
>            Assignee: Steven Phillips
>
> Request to add full DESCRIBE support for Parquet.
> Currently the describe command results in a blank table being printed instead 
> of the schema, which is unhelpful, so I do a select * limit 1 instead.
> While trying to describe lots of Parquet data could be inefficient, I propose 
> the following solution:
> Read the first parquet file and assume that is the schema. Extend the 
> DESCRIBE command to have a user-configurable number of parquet files to read 
> to present a merged schema for the data source, as well as an ALL keywords to 
> scan all parquet files to create true global schema.
> In case of schema evolution you could try reading the newest and oldest 
> parquet files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3525) Drill proper DESCRIBE support for Parquet

Reply via email to