[
https://issues.apache.org/jira/browse/DRILL-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Khurram Faraaz updated DRILL-5662:
----------------------------------
Component/s: Storage - Text & CSV
> Compliant text reader (CSV) opens, closes, reopens file with headers
> --------------------------------------------------------------------
>
> Key: DRILL-5662
> URL: https://issues.apache.org/jira/browse/DRILL-5662
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Minor
> Fix For: Future
>
>
> The "compliant" (CSV) reader can optional read headers from a file. To do so,
> the reader:
> * Opens the input stream
> * Reads headers
> * Closes the input stream
> * Opens the input stream
> * Reads data (skipping headers)
> * Closes the input stream
> While the above certainly works, it has an unnecessary close/open cycle. Many
> CSV readers simply read the header and use the same stream to read data.
> Drill should do so also.
> In fact, Drill has historically coded its own headers scanner. The first was
> badly broken, but DRILL-5498 improved the parsing (though not file handling.)
> Given that Drill's "compliant" text reader is based on the UniVocity library,
> and that library can parse headers, we should probably just reuse that
> existing code which has, very likely, evolved to handle the header usages
> seen in the wild.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)