Aman Sinha created IMPALA-10615:
-----------------------------------
Summary: Cardinality estimates for some scalar functions could be
improved
Key: IMPALA-10615
URL: https://issues.apache.org/jira/browse/IMPALA-10615
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: Aman Sinha
The 10% default cardinality estimate for predicates involving most scalar
functions can be a significant under-estimate. Consider the following
cardinality estimate with UPPER():
{noformat}
[localhost:21050] tpch> explain select * from nation where upper(n_name) is not
null;
| 00:SCAN HDFS [tpch.nation] |
| HDFS partitions=1/1 files=1 size=2.15KB |
| predicates: upper(n_name) IS NOT NULL |
| row-size=109B cardinality=3 |
+------------------------------------------------------------+
{noformat}
Since n_name is non-null, the actual cardinality is 25, as shown below:
{noformat}
[localhost:21050] tpch> explain select * from nation where n_name is not null;
| 00:SCAN HDFS [tpch.nation] |
| HDFS partitions=1/1 files=1 size=2.15KB |
| predicates: n_name IS NOT NULL |
| row-size=109B cardinality=25 |
+------------------------------------------------------------+
{noformat}
In general, if a scalar function cannot change the nullability of its input, we
should compute the same selectivity.
Note that for explicit CAST, we do the right thing:
{noformat}
[localhost:21050] tpch> explain select * from nation where cast(n_name as
varchar(10)) is not null;
| 00:SCAN HDFS [tpch.nation] |
| HDFS partitions=1/1 files=1 size=2.15KB |
| predicates: CAST(n_name AS VARCHAR(10)) IS NOT NULL |
| row-size=109B cardinality=25 |
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]