[
https://issues.apache.org/jira/browse/IMPALA-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers updated IMPALA-8045:
--------------------------------
Summary: Rollup of Smaller Join Cardinality Issues (was: ScanNode
confusion between table and scan input cardinality)
> Rollup of Smaller Join Cardinality Issues
> -----------------------------------------
>
> Key: IMPALA-8045
> URL: https://issues.apache.org/jira/browse/IMPALA-8045
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.1.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
>
> The {{ScanNode}} class in the scanner contains an {{inputCardinality_}} field
> used by join calculations as a proxy for the table size. However, the actual
> scan node implementations set the {{inputCardinality_}} to the estimated
> number of rows *read* by the scan, which is useful when understanding the
> physical scan structure. But, for joins, we need the base table cardinality.
> For example, the join may use the input cardinality to understand the
> reduction in rows due to filters in order to adjust the NDV of key columns.
> But, since the input cardinality is the scan count, not the table row count,
> the math does not work out.
> The solution is to clarify the code to separate the idea of scan count vs.
> base table row count.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]