[
https://issues.apache.org/jira/browse/DRILL-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935570#comment-15935570
]
Rahul Challapalli commented on DRILL-5267:
------------------------------------------
The fix can only be manually verified. Functional automation is challenging and
needs tweaking to the test framework. That being said, I would be very worried
if in future drill suddenly starts spilling more than necessary to disk. One
option is to have a performance test validate it since more spilling means more
query execution time.
> Managed external sort spills too often with Parquet data
> --------------------------------------------------------
>
> Key: DRILL-5267
> URL: https://issues.apache.org/jira/browse/DRILL-5267
> Project: Apache Drill
> Issue Type: Sub-task
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> DRILL-5266 describes how Parquet produces low-density record batches. The
> result of these batches is that the external sort spills more frequently than
> it should because it sizes spill files based on batch size, not data content
> of the batch. Since Parquet batches are 95% empty space, the spill files end
> up far too small.
> Adjust the spill calculations based on actual data content, not the size of
> the overall record batch.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)