PHILO-HE commented on code in PR #5240:
URL: https://github.com/apache/incubator-gluten/pull/5240#discussion_r1550934905
##########
backends-velox/src/test/scala/org/apache/gluten/execution/TestOperator.scala:
##########
@@ -1236,4 +1236,34 @@ class TestOperator extends
VeloxWholeStageTransformerSuite {
}
}
}
+
+ test("Cast date to string") {
+ withTempPath {
+ path =>
+ Seq("2023-01-01", "2023-01-02", "2023-01-03")
+ .toDF("dateColumn")
+ .select(to_date($"dateColumn", "yyyy-MM-dd").as("dateColumn"))
+ .write
+ .parquet(path.getCanonicalPath)
+
spark.read.parquet(path.getCanonicalPath).createOrReplaceTempView("view")
+ runQueryAndCompare("SELECT cast(dateColumn as string) from view") {
+ checkGlutenOperatorMatch[ProjectExecTransformer]
+ }
+ }
+ }
+
+ test("Cast date to timestamp") {
Review Comment:
> Spark's conversion of the date type to timestamp is only supported up to
the day of the day, and will not cause problems in different time zones.
spark.sql("select cast(date'2023-01-02 01:01:01' as timestamp) as ts").show
+-------------------+
| ts |
+-------------------+
|2023-01-02 00:00:00|
+-------------------+
@dcoliversun, let me help clarify a bit. Actually, timezone matters in
casting date to timestamp, i.e., date value is adjusted according to the
configured local timezone during the casting. And in Spark, timestamp is always
corresponding to UTC+0 timezone.
The reason for why the above sql returns the result without timezone
adjusted is, the printed result is produced by implicitly casting timestamp to
string, where local timezone is also considered. See [spark
code](https://github.com/apache/spark/blob/v3.3.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L358).
This can also explain your mentioned phenomenon: "In spark, timestamp behaves
differently in df.show and df.collect".
We can let Spark write timestamp result to parquet, and then print the
parquet content to see the difference when different timezone is configured.
Or, just check the difference of returned timestamp dataframe, as your added
test does.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]