Paul Rogers created DRILL-5662:
----------------------------------
Summary: Compliant text reader (CSV) opens, closes, reopens file
with headers
Key: DRILL-5662
URL: https://issues.apache.org/jira/browse/DRILL-5662
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
Assignee: Paul Rogers
Priority: Minor
Fix For: Future
The "compliant" (CSV) reader can optional read headers from a file. To do so,
the reader:
* Opens the input stream
* Reads headers
* Closes the input stream
* Opens the input stream
* Reads data (skipping headers)
* Closes the input stream
While the above certainly works, it has an unnecessary close/open cycle. Many
CSV readers simply read the header and use the same stream to read data. Drill
should do so also.
In fact, Drill has historically coded its own headers scanner. The first was
badly broken, but DRILL-5498 improved the parsing (though not file handling.)
Given that Drill's "compliant" text reader is based on the UniVocity library,
and that library can parse headers, we should probably just reuse that existing
code which has, very likely, evolved to handle the header usages seen in the
wild.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)