[jira] [Created] (DRILL-5661) CSV reader created, holds onto two buffers per file with headers

Paul Rogers (JIRA) Wed, 05 Jul 2017 22:09:40 -0700

Paul Rogers created DRILL-5661:
----------------------------------

             Summary: CSV reader created, holds onto two buffers per file with 
headers
                 Key: DRILL-5661
                 URL: https://issues.apache.org/jira/browse/DRILL-5661
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
            Priority: Minor
             Fix For: Future



DRILL-5273 fixed a problem in the "compliant" (CSV) record reader that would 
cause Drill to exhaust memory. Each reader would allocate two direct memory 
blocks, but not free them until the end of the fragment. Scan 1000 files and we 
would get 1000 allocations, with only a single pair being active at a time.

As it turns out, DRILL-5273 missed a second pair created when reading column 
headers:

{code}
 private String [] extractHeader() throws SchemaChangeException, IOException, 
ExecutionSetupException{
...
    TextInput hInput = new TextInput(settings,  hStream, 
oContext.getManagedBuffer(READ_BUFFER), 0, split.getLength());
    this.reader = new TextReader(settings, hInput, hOutput, 
oContext.getManagedBuffer(WHITE_SPACE_BUFFER));
{code}

If a query uses CSV column headings, the query is subject to the same memory 
exhaustion seen earlier for `columns` style queries. (And, before DRILL-5273, 
queries with column headers were twice as subject to memory exhaustion.)

The solution is to simply reuse the existing buffers: the buffers are then 
first used for the header line, then reused for data lines. No need at all for 
two sets of buffers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5661) CSV reader created, holds onto two buffers per file with headers

Reply via email to