[
https://issues.apache.org/jira/browse/DRILL-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948846#comment-14948846
]
Jacques Nadeau commented on DRILL-951:
--------------------------------------
It is definitely embarrassing that we don't have this functionality already.
The extensions points exist inside the code. What needs to be done:
- Implement a new org.apache.drill.exec.store.easy.text.compliant.TextOutput
that outputs a set of VarChar vectors
- Do a two-phased read where each reader reads the file header before moving
to their split
- Expose the functionality (initially) as a property of the TextReaderConfig
- Update the text reader's handling of projection pushdown
> CSV header row should be parsed
> -------------------------------
>
> Key: DRILL-951
> URL: https://issues.apache.org/jira/browse/DRILL-951
> Project: Apache Drill
> Issue Type: New Feature
> Components: Storage - Text & CSV
> Reporter: Tomer Shiran
> Fix For: Future
>
>
> CSV reader is currently treating header names like regular rows. There should
> be a way to treat the header row as the column names (optional?).
> I exported this dataset to a CSV:
> https://data.sfgov.org/Public-Safety/SFPD-Incidents-Previous-Three-Months/tmnf-yvry
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)