[
https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603246#comment-16603246
]
Ashutosh Chauhan commented on HIVE-20491:
-----------------------------------------
[~kgyrtkirk] Selection of kind of hashtable is done by Vectorizer which runs
*after* ConvertJoinMapJoin which does algo selection. I see you have updated
size computation assuming fast hashtable but wont it better that we first do
memory computation using optimized version and then using fast. If fast
qualifies set that in Join so that vectorizer can pick correct hashtable type?
Though, since fast hashtables are bigger current approach also works though its
more conservative than needed.
> Fix mapjoin size estimations for Fast implementation
> ----------------------------------------------------
>
> Key: HIVE-20491
> URL: https://issues.apache.org/jira/browse/HIVE-20491
> Project: Hive
> Issue Type: Improvement
> Components: Statistics
> Reporter: Zoltan Haindrich
> Assignee: Zoltan Haindrich
> Priority: Major
> Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch,
> HIVE-20491.02.patch
>
>
> HIVE-19824 have fixed the estimations; but it calculated for the "optimized"
> impl; the "fast" one has a little bit bigger footprint.
> It also seems like fast is a bit overestimated at runtime...that should be
> also taken care of.
> | numkeys | implementation | compiler estimation | runtime estimation |
> runtime measurement | ce / rm | re / rm |
> | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 |
> | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 |
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)