[ 
https://issues.apache.org/jira/browse/DRILL-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948846#comment-14948846
 ] 

Jacques Nadeau commented on DRILL-951:
--------------------------------------

It is definitely embarrassing that we don't have this functionality already. 
The extensions points exist inside the code. What needs to be done:

 - Implement a new org.apache.drill.exec.store.easy.text.compliant.TextOutput 
that outputs a set of VarChar vectors
 - Do a two-phased read where each reader reads the file header before moving 
to their split
 - Expose the functionality (initially) as a property of the TextReaderConfig
 - Update the text reader's handling of projection pushdown

> CSV header row should be parsed
> -------------------------------
>
>                 Key: DRILL-951
>                 URL: https://issues.apache.org/jira/browse/DRILL-951
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Text & CSV
>            Reporter: Tomer Shiran
>             Fix For: Future
>
>
> CSV reader is currently treating header names like regular rows. There should 
> be a way to treat the header row as the column names (optional?).
> I exported this dataset to a CSV: 
> https://data.sfgov.org/Public-Safety/SFPD-Incidents-Previous-Three-Months/tmnf-yvry



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to