Paul Rogers created DRILL-5273:
----------------------------------
Summary: ScanBatch exhausts memory when reading 5000 small files
Key: DRILL-5273
URL: https://issues.apache.org/jira/browse/DRILL-5273
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.10
Reporter: Paul Rogers
Assignee: Paul Rogers
Fix For: 1.10
A test case was created that consists of 5000 text files, each with a single
line with the file number: 1 to 5001. Each file has a single record, and at
most 4 characters per record.
Run the following query:
{code}
SELECT * FROM `dfs.data`.`5000files/text
{code}
The query will fail with an OOM in the scan batch on around record 3700 on a
Mac with 4GB of direct memory.
The code to read records in {ScanBatch} is complex. The following appears to
occur:
* Iterate over the record readers for each file.
* For each, call setup
The setup code is:
{code}
public void setup(OperatorContext context, OutputMutator outputMutator)
throws ExecutionSetupException {
oContext = context;
readBuffer = context.getManagedBuffer(READ_BUFFER);
whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER);
{code}
The two buffers are in direct memory. There is no code that releases the
buffers.
The sizes are:
{code}
private static final int READ_BUFFER = 1024*1024;
private static final int WHITE_SPACE_BUFFER = 64*1024;
= 1,048,576 + 65536 = 1,114,112
{code}
This is exactly the amount of memory that accumulates per call to
{{ScanBatch.next()}}
{code}
Ctor: 0 -- Initial memory in constructor
Init setup: 1114112 -- After call to first record reader setup
Entry Memory: 1114112 -- first next() call, returns one record
Entry Memory: 1114112 -- second next(), eof and start second reader
Entry Memory: 2228224 -- third next(), second reader returns EOF
...
{code}
If we leak 1 MB per file, with 5000 files we would leak 5 GB of memory, which
would explain the OOM when given only 4 GB.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)