[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698469#comment-16698469 ]
Sujith commented on SPARK-26165: -------------------------------- This change is been done as part of PR "[https://github.com/apache/spark/pull/6888] " where we introduced string casting for if left/right expression type is TimeStamp, For equality cases we were implicitly casting the right/left side string type expressions to TimeStamp. i thought to just improvise the logic as per my above description. if not a valid use-case then we can close. this issue i met in our customer environment where they reported filter query is slow, after doing an initial analysis i came to know we were casting the TimeStamp column expression to string. > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer > Affects Versions: 2.3.2, 2.4.0 > Reporter: Sujith > Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org