[ 
https://issues.apache.org/jira/browse/HAWQ-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-450:
---------------------------
    Fix Version/s: backlog

> Schema auto discovery on HDFS
> -----------------------------
>
>                 Key: HAWQ-450
>                 URL: https://issues.apache.org/jira/browse/HAWQ-450
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: PXF
>            Reporter: Shivram Mani
>            Assignee: Goden Yao
>              Labels: gsoc2016
>             Fix For: backlog
>
>
> File formats such as avro,json have the schema information along with the 
> data. Other formats such as text/CSV schema inference is a bit more complex. 
> This can be broken down to individual subtasks corresponding to specific file 
> formats.
> Introduce additional parameters in the PXF api inferSchema, header in order 
> to auo discover schema.
> Spark provides a similar option eg: https://github.com/databricks/spark-csv 
> provides options for schema inference
> The idea is to eventually expose metadata information of the underlying file 
> on HDFS via the /getMetdata API https://issues.apache.org/jira/browse/HAWQ-459



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to