[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698469#comment-16698469 ]
Sujith edited comment on SPARK-26165 at 11/26/18 7:31 AM: ---------------------------------------------------------- This change is been done as part of PR "[https://github.com/apache/spark/pull/6888] " where we introduced string casting for if left/right expression type is TimeStamp, For equality cases we were implicitly casting the right/left side string type expressions to TimeStamp. There are some testcases also present which similar usage and we are casting the type to string !image-2018-11-26-13-01-28-299.png! i thought to just improvise the logic as per my above description. this issue we met in our customer environment where they reported filter query is slow, after doing an initial analysis i came to know we were casting the TimeStamp column expression to string. was (Author: s71955): This change is been done as part of PR "[https://github.com/apache/spark/pull/6888] " where we introduced string casting for if left/right expression type is TimeStamp, For equality cases we were implicitly casting the right/left side string type expressions to TimeStamp. There are some testcases also present which similar usage and we are casting the type to string !image-2018-11-26-13-01-28-299.png! i thought to just improvise the logic as per my above description. if not a valid use-case then we can close. this issue i met in our customer environment where they reported filter query is slow, after doing an initial analysis i came to know we were casting the TimeStamp column expression to string. > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer > Affects Versions: 2.3.2, 2.4.0 > Reporter: Sujith > Priority: Major > Attachments: image-2018-11-26-13-00-36-896.png, > image-2018-11-26-13-01-28-299.png, timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org