Paul Rogers created DRILL-5134:
----------------------------------

             Summary: TestMergeJoinWithSchemaChanges throws exception with 
paged SV4
                 Key: DRILL-5134
                 URL: https://issues.apache.org/jira/browse/DRILL-5134
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.9.0
            Reporter: Paul Rogers
            Priority: Minor


The {{TestMergeJoinWithSchemaChanges}} test exercises the in-memory merge sort 
with union vectors. (Note that union vectors are not fully supported.)

The merge sort creates an SV4 to hold an index into the sorted results. SV4's 
have the ability to page results as batches to upstream.

When {{TestMergeJoinWithSchemaChanges}} is run using the "managed" external 
sort and union vectors, a downstream operator throws an index out of range 
exception. However, when run with the "classic" external sort, no such 
exception is thrown.

The difference is that the classic version returns all rows in a single batch, 
while the managed version attempted to return rows in a batch of a specified 
size.

The paging approach works for tests that do not include union vectors, but 
fails for those that do include them.

Modifying the managed version to return all results in a single batch does work.

The problem with this workaround is that there will come a size beyond which 
sorted results cannot be returned in a single batch and paging will be 
necessary. The sort buffer can, for example, be set to 10G, which is too large 
for a single batch. Or, the sort can process more than 64K rows, which is also 
too large for a single batch. In those scenarios, union vectors with SV4 will 
fail.

Since union vectors are not supported, the workaround described above is used 
to get the test to pass. This ticket records the issue for a future time in 
which we attempt to support union vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to