[ 
https://issues.apache.org/jira/browse/IMPALA-10116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated IMPALA-10116:
--------------------------------
    Description: 
Query 1 below uses 'casttobigint()'  in the IS NOT NULL predicate and its 
selectivity is computed as the default 10% of the input rows, resulting in 
cardinality = 7.3K. The predicate in Query 2 with 'CAST' expr computes the 
correct cardinality of 73.05K. 

Query 1:
{noformat}
Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq = 
d2.d_week_seq - 52 and casttobigint(d1.d_week_seq) is not null and 
casttobigint(d2.d_week_seq) is not null
                                                       |
| 00:SCAN HDFS [tpcds.date_dim d1]                            |
|    HDFS partitions=1/1 files=1 size=9.84MB                  |
|    predicates: casttobigint(d1.d_week_seq) IS NOT NULL      |
|    runtime filters: RF000 -> d1.d_week_seq                  |
|    row-size=255B cardinality=7.30K                          |
+-------------------------------------------------------------+
{noformat}

Query 2:
{noformat}
Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq = 
d2.d_week_seq - 52 and cast(d1.d_week_seq as bigint) is not null and 
cast(d2.d_week_seq as bigint) is not null 

| 00:SCAN HDFS [tpcds.date_dim d1]                            |
|    HDFS partitions=1/1 files=1 size=9.84MB                  |
|    predicates: CAST(d1.d_week_seq AS BIGINT) IS NOT NULL    |
|    runtime filters: RF000 -> d1.d_week_seq                  |
|    row-size=255B cardinality=73.05K                         |
+-------------------------------------------------------------+
{noformat}

Query 1  should ideally provide the same cardinality as Query 2.  Note that I 
had to comment out the following lines in FunctionCallExpr.java because a user 
query is not supposed to directly call the builtin cast function. However, for 
an external frontend module that calls functions in impala-frontend.jar, this 
is supported and we should make the behavior consistent.
{noformat}
+//    if (isBuiltinCastFunction()) {
+//      throw new AnalysisException(toSql() +
+//          " is reserved for internal use only. Use 'cast(expr AS type)' 
instead.");
+//    }
{noformat}

  was:
Query 1 below uses 'casttobigint()'  in the IS NOT NULL predicate and its 
selectivity is computed as the default 10% of the input rows, resulting in 
cardinality = 7.3K. The predicate in Query 2 with 'CAST' expr computes the 
correct cardinality of 73.05K. 

Query 1:
{noformat}
Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq = 
d2.d_week_seq - 52 and casttobigint(d1.d_week_seq) is not null and 
casttobigint(d2.d_week_seq) is not null
                                                       |
| 00:SCAN HDFS [tpcds.date_dim d1]                            |
|    HDFS partitions=1/1 files=1 size=9.84MB                  |
|    predicates: casttobigint(d1.d_week_seq) IS NOT NULL      |
|    runtime filters: RF000 -> d1.d_week_seq                  |
|    row-size=255B cardinality=7.30K                          |
+-------------------------------------------------------------+
{noformat}

Query 2:
{noformat}
Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq = 
d2.d_week_seq - 52 and cast(d1.d_week_seq as bigint) is not null and 
cast(d2.d_week_seq as bigint) is not null
{noformat}                                                    |
| 00:SCAN HDFS [tpcds.date_dim d1]                            |
|    HDFS partitions=1/1 files=1 size=9.84MB                  |
|    predicates: CAST(d1.d_week_seq AS BIGINT) IS NOT NULL    |
|    runtime filters: RF000 -> d1.d_week_seq                  |
|    row-size=255B cardinality=73.05K                         |
+-------------------------------------------------------------+
{noformat}

Query 1  should ideally provide the same cardinality as Query 2.  Note that I 
had to comment out the following lines in FunctionCallExpr.java because a user 
query is not supposed to directly call the builtin cast function. However, for 
an external frontend module that calls functions in impala-frontend.jar, this 
is supported and we should make the behavior consistent.
{noformat}
+//    if (isBuiltinCastFunction()) {
+//      throw new AnalysisException(toSql() +
+//          " is reserved for internal use only. Use 'cast(expr AS type)' 
instead.");
+//    }
{noformat}


> Builtin cast function's selectivity is different from that of explicit cast
> ---------------------------------------------------------------------------
>
>                 Key: IMPALA-10116
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10116
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>
> Query 1 below uses 'casttobigint()'  in the IS NOT NULL predicate and its 
> selectivity is computed as the default 10% of the input rows, resulting in 
> cardinality = 7.3K. The predicate in Query 2 with 'CAST' expr computes the 
> correct cardinality of 73.05K. 
> Query 1:
> {noformat}
> Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq = 
> d2.d_week_seq - 52 and casttobigint(d1.d_week_seq) is not null and 
> casttobigint(d2.d_week_seq) is not null
>                                                        |
> | 00:SCAN HDFS [tpcds.date_dim d1]                            |
> |    HDFS partitions=1/1 files=1 size=9.84MB                  |
> |    predicates: casttobigint(d1.d_week_seq) IS NOT NULL      |
> |    runtime filters: RF000 -> d1.d_week_seq                  |
> |    row-size=255B cardinality=7.30K                          |
> +-------------------------------------------------------------+
> {noformat}
> Query 2:
> {noformat}
> Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq = 
> d2.d_week_seq - 52 and cast(d1.d_week_seq as bigint) is not null and 
> cast(d2.d_week_seq as bigint) is not null 
> | 00:SCAN HDFS [tpcds.date_dim d1]                            |
> |    HDFS partitions=1/1 files=1 size=9.84MB                  |
> |    predicates: CAST(d1.d_week_seq AS BIGINT) IS NOT NULL    |
> |    runtime filters: RF000 -> d1.d_week_seq                  |
> |    row-size=255B cardinality=73.05K                         |
> +-------------------------------------------------------------+
> {noformat}
> Query 1  should ideally provide the same cardinality as Query 2.  Note that I 
> had to comment out the following lines in FunctionCallExpr.java because a 
> user query is not supposed to directly call the builtin cast function. 
> However, for an external frontend module that calls functions in 
> impala-frontend.jar, this is supported and we should make the behavior 
> consistent.
> {noformat}
> +//    if (isBuiltinCastFunction()) {
> +//      throw new AnalysisException(toSql() +
> +//          " is reserved for internal use only. Use 'cast(expr AS type)' 
> instead.");
> +//    }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to