[
https://issues.apache.org/jira/browse/DRILL-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hari Sekhon updated DRILL-3525:
-------------------------------
Description:
Request to add full DESCRIBE support for Parquet.
Currently the describe command results in a blank table being printed instead
of the schema, which is unhelpful, so I do a select * limit 1 instead.
While trying to describe lots of Parquet data could be inefficient, I propose
the following solution:
Read the first parquet file and assume that is the schema. Extend the DESCRIBE
command to have a user-configurable number of parquet files to read to present
a merged schema for the data source, as well as an ALL keywords to scan all
parquet files to create true global schema.
In case of schema evolution you could try reading the newest and oldest parquet
files.
was:
Request to add full DESCRIBE support for Parquet.
Currently the describe command results in a blank table being printed instead
of the schema, which is unhelpful, so I do a select * limit 1 instead.
While trying to describe lots of Parquet data could be inefficient, I propose
the following solution:
Read the first parquet file and assume that is the schema. Extend the DESCRIBE
command to have a user-configurable number of parquet files to read to present
a merged schema for the data source, as well as an ALL keywords to scan all
parquet files to create true global schema.
> Drill proper DESCRIBE support for Parquet
> -----------------------------------------
>
> Key: DRILL-3525
> URL: https://issues.apache.org/jira/browse/DRILL-3525
> Project: Apache Drill
> Issue Type: Bug
> Components: Metadata, Storage - Parquet
> Affects Versions: 1.1.0
> Reporter: Hari Sekhon
> Assignee: Steven Phillips
>
> Request to add full DESCRIBE support for Parquet.
> Currently the describe command results in a blank table being printed instead
> of the schema, which is unhelpful, so I do a select * limit 1 instead.
> While trying to describe lots of Parquet data could be inefficient, I propose
> the following solution:
> Read the first parquet file and assume that is the schema. Extend the
> DESCRIBE command to have a user-configurable number of parquet files to read
> to present a merged schema for the data source, as well as an ALL keywords to
> scan all parquet files to create true global schema.
> In case of schema evolution you could try reading the newest and oldest
> parquet files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)