Paul Rogers created DRILL-5100:
----------------------------------

             Summary: External Sort does not manage memory requirements of a 
schema change
                 Key: DRILL-5100
                 URL: https://issues.apache.org/jira/browse/DRILL-5100
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers


The external sort is given a fixed amount of memory to hold buffered in-memory 
batches prior to spilling. External sort also handles certain schema changes 
when union vectors are enabled. When a schema change occurs, existing vectors 
are coerced into the new schema format, perhaps replacing an existing vector 
with a new union vector.

This conversion requires (direct) memory. When done when the external sort has 
already almost filled its in-memory buffer, the conversion process can cause 
memory overflow and failure.

The following show the allocated memory before and after schema changes in the 
unit tests {{TestExternalSort.testNumericTypes}}:

{code}
Before: 134144
After: 150528
Before: 150528
After: 166912
{code}

Union vectors appear to be larger than the original BIGINT vectors. External 
sort must anticipate this and perhaps spill to ensure sufficient room exists 
for the new, larger vectors.

Further, the conversion process itself requires that two copies of each vector 
be in memory: the original and the new, converted one. The external sort does 
not check to ensure this much working memory is available, leading to potential 
OOM errors during each vector conversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to