[
https://issues.apache.org/jira/browse/DRILL-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967208#comment-16967208
]
Boaz Ben-Zvi commented on DRILL-7141:
-------------------------------------
This enhancement requires the use of good statistics (on the probe-side), as
spilling decisions (i.e., which partitions) happen during the build phase.
> Hash-Join (and Agg) should always spill to disk the least used partition
> ------------------------------------------------------------------------
>
> Key: DRILL-7141
> URL: https://issues.apache.org/jira/browse/DRILL-7141
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Relational Operators
> Affects Versions: 1.15.0
> Reporter: Kunal Khatua
> Assignee: Boaz Ben-Zvi
> Priority: Major
>
> When the probe-side data for a hash join is skewed, it is preferable to have
> the corresponding partition on the build side to be in memory.
> Currently, with the spill-to-disk feature, the partition selected for
> spilling to disk is done at random. This means that a highly skewed
> probe-side data would also spill for lack of a corresponding hash table
> partition in memory.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)