[
https://issues.apache.org/jira/browse/HAWQ-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Goden Yao updated HAWQ-450:
---------------------------
Fix Version/s: backlog
> Schema auto discovery on HDFS
> -----------------------------
>
> Key: HAWQ-450
> URL: https://issues.apache.org/jira/browse/HAWQ-450
> Project: Apache HAWQ
> Issue Type: New Feature
> Components: PXF
> Reporter: Shivram Mani
> Assignee: Goden Yao
> Labels: gsoc2016
> Fix For: backlog
>
>
> File formats such as avro,json have the schema information along with the
> data. Other formats such as text/CSV schema inference is a bit more complex.
> This can be broken down to individual subtasks corresponding to specific file
> formats.
> Introduce additional parameters in the PXF api inferSchema, header in order
> to auo discover schema.
> Spark provides a similar option eg: https://github.com/databricks/spark-csv
> provides options for schema inference
> The idea is to eventually expose metadata information of the underlying file
> on HDFS via the /getMetdata API https://issues.apache.org/jira/browse/HAWQ-459
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)