[jira] [Assigned] (IMPALA-13721) CPU costing for scan materialization is using wrong value for input cardinality

Riza Suminto (Jira) Fri, 28 Mar 2025 15:57:48 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Riza Suminto reassigned IMPALA-13721:
-------------------------------------

    Assignee: Riza Suminto  (was: David Rorke)

> CPU costing for scan materialization is using wrong value for input 
> cardinality
> -------------------------------------------------------------------------------
>
>                 Key: IMPALA-13721
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13721
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: David Rorke
>            Assignee: Riza Suminto
>            Priority: Major
>
> When COMPUTE_PROCESSING_COST is enabled, the materialization cost for the 
> HDFS scan node is computed based on an estimate of bytes materialized 
> calculated as:
> {noformat}
> estBytes = (long) Math.ceil(avgRowDataSize * (double) inputCardinality)
> {noformat}
> where inputCardinality is currently the filtered input cardinality (after 
> accounting for runtime filters) returned by getFilteredInputCardinality().
> This is the correct approach when all runtime filters are "partition" filters 
> that skip reading entire files and row groups. But if some or all of the 
> runtime filters are "row level" filters that are applied after the rows are 
> materialized, then getFilteredInputCardinality() reflects the cardinality 
> after this row level filtering and so using it will underestimate the 
> materialization cost.
> We should use an input cardinality here that reflects the estimated 
> cardinality after applying all runtime filters that eliminate data prior to 
> materialization (and ignoring the impact of runtime filters that are applied 
> after materialization).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (IMPALA-13721) CPU costing for scan materialization is using wrong value for input cardinality

Reply via email to