[jira] [Updated] (DRILL-5661) CSV reader created, holds onto two buffers per file with headers

Khurram Faraaz (JIRA) Wed, 05 Jul 2017 22:50:04 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Khurram Faraaz updated DRILL-5661:
----------------------------------
    Component/s: Storage - Text & CSV

> CSV reader created, holds onto two buffers per file with headers
> ----------------------------------------------------------------
>
>                 Key: DRILL-5661
>                 URL: https://issues.apache.org/jira/browse/DRILL-5661
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: Future
>
>
> DRILL-5273 fixed a problem in the "compliant" (CSV) record reader that would 
> cause Drill to exhaust memory. Each reader would allocate two direct memory 
> blocks, but not free them until the end of the fragment. Scan 1000 files and 
> we would get 1000 allocations, with only a single pair being active at a time.
> As it turns out, DRILL-5273 missed a second pair created when reading column 
> headers:
> {code}
>  private String [] extractHeader() throws SchemaChangeException, IOException, 
> ExecutionSetupException{
> ...
>     TextInput hInput = new TextInput(settings,  hStream, 
> oContext.getManagedBuffer(READ_BUFFER), 0, split.getLength());
>     this.reader = new TextReader(settings, hInput, hOutput, 
> oContext.getManagedBuffer(WHITE_SPACE_BUFFER));
> {code}
> If a query uses CSV column headings, the query is subject to the same memory 
> exhaustion seen earlier for `columns` style queries. (And, before DRILL-5273, 
> queries with column headers were twice as subject to memory exhaustion.)
> The solution is to simply reuse the existing buffers: the buffers are then 
> first used for the header line, then reused for data lines. No need at all 
> for two sets of buffers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5661) CSV reader created, holds onto two buffers per file with headers

Reply via email to