[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698469#comment-16698469
 ] 

Sujith edited comment on SPARK-26165 at 11/26/18 7:31 AM:
----------------------------------------------------------

This change is been done as part of PR  
"[https://github.com/apache/spark/pull/6888] "; where we introduced string 
casting for if left/right expression type is TimeStamp, For equality cases we 
were implicitly casting the right/left side string type expressions to 
TimeStamp. 

There are some testcases also present which similar usage and we are casting 
the type to string

!image-2018-11-26-13-01-28-299.png!

 i thought to just improvise the logic as per my above description.

this issue we met in our customer environment where they reported filter query 
is slow, after doing an initial analysis i came to know we were casting the 
TimeStamp column expression to string. 

 

 


was (Author: s71955):
This change is been done as part of PR  
"[https://github.com/apache/spark/pull/6888] "; where we introduced string 
casting for if left/right expression type is TimeStamp, For equality cases we 
were implicitly casting the right/left side string type expressions to 
TimeStamp. 

There are some testcases also present which similar usage and we are casting 
the type to string

!image-2018-11-26-13-01-28-299.png!

 i thought to just improvise the logic as per my above description.

if not a valid use-case then we can close. this issue i met in our customer 
environment where they reported filter query is slow, after doing an initial 
analysis i came to know we were casting the TimeStamp column expression to 
string. 

 

 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-26165
>                 URL: https://issues.apache.org/jira/browse/SPARK-26165
>             Project: Spark
>          Issue Type: Improvement
>          Components: Optimizer
>    Affects Versions: 2.3.2, 2.4.0
>            Reporter: Sujith
>            Priority: Major
>         Attachments: image-2018-11-26-13-00-36-896.png, 
> image-2018-11-26-13-01-28-299.png, timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to