[ 
https://issues.apache.org/jira/browse/DRILL-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935570#comment-15935570
 ] 

Rahul Challapalli commented on DRILL-5267:
------------------------------------------

The fix can only be manually verified. Functional automation is challenging and 
needs tweaking to the test framework. That being said, I would be very worried 
if in future drill suddenly starts spilling more than necessary to disk. One 
option is to have a performance test validate it since more spilling means more 
query execution time.

> Managed external sort spills too often with Parquet data
> --------------------------------------------------------
>
>                 Key: DRILL-5267
>                 URL: https://issues.apache.org/jira/browse/DRILL-5267
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.10.0
>
>
> DRILL-5266 describes how Parquet produces low-density record batches. The 
> result of these batches is that the external sort spills more frequently than 
> it should because it sizes spill files based on batch size, not data content 
> of the batch. Since Parquet batches are 95% empty space, the spill files end 
> up far too small.
> Adjust the spill calculations based on actual data content, not the size of 
> the overall record batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to