[ 
https://issues.apache.org/jira/browse/HIVE-20366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16577380#comment-16577380
 ] 

Vineet Garg commented on HIVE-20366:
------------------------------------

The idea I am experimenting with is to use difference of NDVs to estimate 
cardinality and then subtract that from estimated join cardinality to estimate 
number of mismatched rows which are then used to estimate number of nulls.

> TPC-DS query78 stats estimates are off for is null filter
> ---------------------------------------------------------
>
>                 Key: HIVE-20366
>                 URL: https://issues.apache.org/jira/browse/HIVE-20366
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>            Reporter: Vineet Garg
>            Assignee: Vineet Garg
>            Priority: Major
>         Attachments: HIVE-20366.1.patch, HIVE-20366.2.patch
>
>
> In Query 78, there is Left outer join between fact table combos: stores_sales 
> LOJ store_returns, catalog_sales LOJ catalog_returns and web_sales LOJ 
> web_returns. Each of these joins estimates only a single row and the result 
> is BROADCAST and causes hash table memory errors
> {code}
>          Reducer 12                                 |
> |             Execution mode: vectorized, llap       |
> |             Reduce Operator Tree:                  |
> +----------------------------------------------------+
> |                      Explain                       |
> +----------------------------------------------------+
> |               Map Join Operator                    |
> |                 condition map:                     |
> |                      Left Outer Join 0 to 1        |
> |                 keys:                              |
> |                   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint) |
> |                   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint) |
> |                 outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6, _col8 |
> |                 input vertices:                    |
> |                   1 Map 14                         |
> |                 Statistics: Num rows: 10282477384 Data size: 534184867432 
> Basic stats: COMPLETE Column stats: COMPLETE |
> |                 Filter Operator                    |
> |                   predicate: _col8 is null (type: boolean) |
> |                  * Statistics: Num rows: 1* Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to