[
https://issues.apache.org/jira/browse/IMPALA-10116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aman Sinha updated IMPALA-10116:
--------------------------------
Description:
Query 1 below uses 'casttobigint()' in the IS NOT NULL predicate and its
selectivity is computed as the default 10% of the input rows, resulting in
cardinality = 7.3K. The predicate in Query 2 with 'CAST' expr computes the
correct cardinality of 73.05K.
Query 1:
{noformat}
Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq =
d2.d_week_seq - 52 and casttobigint(d1.d_week_seq) is not null and
casttobigint(d2.d_week_seq) is not null
|
| 00:SCAN HDFS [tpcds.date_dim d1] |
| HDFS partitions=1/1 files=1 size=9.84MB |
| predicates: casttobigint(d1.d_week_seq) IS NOT NULL |
| runtime filters: RF000 -> d1.d_week_seq |
| row-size=255B cardinality=7.30K |
+-------------------------------------------------------------+
{noformat}
Query 2:
{noformat}
Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq =
d2.d_week_seq - 52 and cast(d1.d_week_seq as bigint) is not null and
cast(d2.d_week_seq as bigint) is not null
| 00:SCAN HDFS [tpcds.date_dim d1] |
| HDFS partitions=1/1 files=1 size=9.84MB |
| predicates: CAST(d1.d_week_seq AS BIGINT) IS NOT NULL |
| runtime filters: RF000 -> d1.d_week_seq |
| row-size=255B cardinality=73.05K |
+-------------------------------------------------------------+
{noformat}
Query 1 should ideally provide the same cardinality as Query 2. Note that I
had to comment out the following lines in FunctionCallExpr.java because a user
query is not supposed to directly call the builtin cast function. However, for
an external frontend module that calls functions in impala-frontend.jar, this
is supported and we should make the behavior consistent.
{noformat}
+// if (isBuiltinCastFunction()) {
+// throw new AnalysisException(toSql() +
+// " is reserved for internal use only. Use 'cast(expr AS type)'
instead.");
+// }
{noformat}
was:
Query 1 below uses 'casttobigint()' in the IS NOT NULL predicate and its
selectivity is computed as the default 10% of the input rows, resulting in
cardinality = 7.3K. The predicate in Query 2 with 'CAST' expr computes the
correct cardinality of 73.05K.
Query 1:
{noformat}
Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq =
d2.d_week_seq - 52 and casttobigint(d1.d_week_seq) is not null and
casttobigint(d2.d_week_seq) is not null
|
| 00:SCAN HDFS [tpcds.date_dim d1] |
| HDFS partitions=1/1 files=1 size=9.84MB |
| predicates: casttobigint(d1.d_week_seq) IS NOT NULL |
| runtime filters: RF000 -> d1.d_week_seq |
| row-size=255B cardinality=7.30K |
+-------------------------------------------------------------+
{noformat}
Query 2:
{noformat}
Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq =
d2.d_week_seq - 52 and cast(d1.d_week_seq as bigint) is not null and
cast(d2.d_week_seq as bigint) is not null
{noformat} |
| 00:SCAN HDFS [tpcds.date_dim d1] |
| HDFS partitions=1/1 files=1 size=9.84MB |
| predicates: CAST(d1.d_week_seq AS BIGINT) IS NOT NULL |
| runtime filters: RF000 -> d1.d_week_seq |
| row-size=255B cardinality=73.05K |
+-------------------------------------------------------------+
{noformat}
Query 1 should ideally provide the same cardinality as Query 2. Note that I
had to comment out the following lines in FunctionCallExpr.java because a user
query is not supposed to directly call the builtin cast function. However, for
an external frontend module that calls functions in impala-frontend.jar, this
is supported and we should make the behavior consistent.
{noformat}
+// if (isBuiltinCastFunction()) {
+// throw new AnalysisException(toSql() +
+// " is reserved for internal use only. Use 'cast(expr AS type)'
instead.");
+// }
{noformat}
> Builtin cast function's selectivity is different from that of explicit cast
> ---------------------------------------------------------------------------
>
> Key: IMPALA-10116
> URL: https://issues.apache.org/jira/browse/IMPALA-10116
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 3.4.0
> Reporter: Aman Sinha
> Assignee: Aman Sinha
> Priority: Major
>
> Query 1 below uses 'casttobigint()' in the IS NOT NULL predicate and its
> selectivity is computed as the default 10% of the input rows, resulting in
> cardinality = 7.3K. The predicate in Query 2 with 'CAST' expr computes the
> correct cardinality of 73.05K.
> Query 1:
> {noformat}
> Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq =
> d2.d_week_seq - 52 and casttobigint(d1.d_week_seq) is not null and
> casttobigint(d2.d_week_seq) is not null
> |
> | 00:SCAN HDFS [tpcds.date_dim d1] |
> | HDFS partitions=1/1 files=1 size=9.84MB |
> | predicates: casttobigint(d1.d_week_seq) IS NOT NULL |
> | runtime filters: RF000 -> d1.d_week_seq |
> | row-size=255B cardinality=7.30K |
> +-------------------------------------------------------------+
> {noformat}
> Query 2:
> {noformat}
> Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq =
> d2.d_week_seq - 52 and cast(d1.d_week_seq as bigint) is not null and
> cast(d2.d_week_seq as bigint) is not null
> | 00:SCAN HDFS [tpcds.date_dim d1] |
> | HDFS partitions=1/1 files=1 size=9.84MB |
> | predicates: CAST(d1.d_week_seq AS BIGINT) IS NOT NULL |
> | runtime filters: RF000 -> d1.d_week_seq |
> | row-size=255B cardinality=73.05K |
> +-------------------------------------------------------------+
> {noformat}
> Query 1 should ideally provide the same cardinality as Query 2. Note that I
> had to comment out the following lines in FunctionCallExpr.java because a
> user query is not supposed to directly call the builtin cast function.
> However, for an external frontend module that calls functions in
> impala-frontend.jar, this is supported and we should make the behavior
> consistent.
> {noformat}
> +// if (isBuiltinCastFunction()) {
> +// throw new AnalysisException(toSql() +
> +// " is reserved for internal use only. Use 'cast(expr AS type)'
> instead.");
> +// }
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]