Tim Armstrong created IMPALA-6290: ------------------------------------- Summary: Simplify ScannerContext buffer management to only use one I/O buffer at a time. Key: IMPALA-6290 URL: https://issues.apache.org/jira/browse/IMPALA-6290 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Tim Armstrong Assignee: Tim Armstrong
I'm doing this as part of the HDFS buffer management work but splitting it out as a subtask since it's a logically independent change. ScannerContext currently depends on the scanners calling ReleaseCompletedResources() repeatedly to free up buffers. Currently this works ok, but if we add a hard constraint to the number of I/O buffers, then we could hit resource exhaustion if we scan too far ahead without calling ReleaseCompletedResources(). E.g. if we have 3 * 8MB I/O buffers to use and try to scan 25MB before calling ReleaseCompletedResources(), we end up in a state where all I/O buffers are sitting in the ScannerContext. Certain ScannerContext operations also can exhaust the I/O buffers no matter how frequently ReleaseCompletedResources() is called. E.g. ReadBytes(25MB) or SkipBytes(25MB) would run into that problem with the current implementation. I spent some time looking at the ScannerContext API and the calling patterns of the scanners and came to the conclusion that there's no requirement for us to accumulate buffers in completed_io_buffers_ - we don't generally assume that the memory returned from previous calls remains valid when the read position from the stream is advanced. -- This message was sent by Atlassian JIRA (v6.4.14#64029)