[
https://issues.apache.org/jira/browse/DRILL-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Khurram Faraaz updated DRILL-5661:
----------------------------------
Component/s: Storage - Text & CSV
> CSV reader created, holds onto two buffers per file with headers
> ----------------------------------------------------------------
>
> Key: DRILL-5661
> URL: https://issues.apache.org/jira/browse/DRILL-5661
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Minor
> Fix For: Future
>
>
> DRILL-5273 fixed a problem in the "compliant" (CSV) record reader that would
> cause Drill to exhaust memory. Each reader would allocate two direct memory
> blocks, but not free them until the end of the fragment. Scan 1000 files and
> we would get 1000 allocations, with only a single pair being active at a time.
> As it turns out, DRILL-5273 missed a second pair created when reading column
> headers:
> {code}
> private String [] extractHeader() throws SchemaChangeException, IOException,
> ExecutionSetupException{
> ...
> TextInput hInput = new TextInput(settings, hStream,
> oContext.getManagedBuffer(READ_BUFFER), 0, split.getLength());
> this.reader = new TextReader(settings, hInput, hOutput,
> oContext.getManagedBuffer(WHITE_SPACE_BUFFER));
> {code}
> If a query uses CSV column headings, the query is subject to the same memory
> exhaustion seen earlier for `columns` style queries. (And, before DRILL-5273,
> queries with column headers were twice as subject to memory exhaustion.)
> The solution is to simply reuse the existing buffers: the buffers are then
> first used for the header line, then reused for data lines. No need at all
> for two sets of buffers.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)