Tim Armstrong created IMPALA-6290:
-------------------------------------

             Summary: Simplify ScannerContext buffer management to only use one 
I/O buffer at a time.
                 Key: IMPALA-6290
                 URL: https://issues.apache.org/jira/browse/IMPALA-6290
             Project: IMPALA
          Issue Type: Sub-task
          Components: Backend
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong


I'm doing this as part of the HDFS buffer management work but splitting it out 
as a subtask since it's a logically independent change.

ScannerContext currently depends on the scanners calling 
ReleaseCompletedResources() repeatedly to free up buffers. Currently this works 
ok, but if we add a hard constraint to the number of I/O buffers, then we could 
hit resource exhaustion if we scan too far ahead without calling 
ReleaseCompletedResources(). E.g. if we have 3 * 8MB I/O buffers to use and try 
to scan 25MB before calling ReleaseCompletedResources(), we end up in a state 
where all I/O buffers are sitting in the ScannerContext.

Certain ScannerContext operations also can exhaust the I/O buffers no matter 
how frequently ReleaseCompletedResources() is called. E.g. ReadBytes(25MB) or 
SkipBytes(25MB) would run into that problem with the current implementation.

I spent some time looking at the ScannerContext API and the calling patterns of 
the scanners and came to the conclusion that there's no requirement for us to 
accumulate buffers in completed_io_buffers_ - we don't generally assume that 
the memory returned from previous calls remains valid when the read position 
from the stream is advanced.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to