[
https://issues.apache.org/jira/browse/SPARK-40791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619653#comment-17619653
]
Yang Jie commented on SPARK-40791:
----------------------------------
Also cc [~srowen] [~rednaxelafx] [~dongjoon] [~cloud_fan] [~Qin Yao]
At the same time, I found the following in the migration document:
* In Spark 3.0, datetime pattern letter `F` is **aligned day of week in
month** that represents the concept of the count of days within the period of a
week where the weeks are aligned to the start of the month. In Spark version
2.4 and earlier, it is **week of month** that represents the concept of the
count of weeks within the month where weeks start on a fixed day-of-week, e.g.
`2020-07-30` is 30 days (4 weeks and 2 days) after the first day of the month,
so `date_format(date '2020-07-30', 'F')` returns 2 in Spark 3.0, but as a week
count in Spark 2.x, it returns 5 because it locates in the 5th week of July
2020, where week one is 2020-07-01 to 07-04.
It seems that the behavior before Spark 2.4 is correct. What do you think about
it?
> The semantics of `F` in `DateTimeFormatter` have changed
> --------------------------------------------------------
>
> Key: SPARK-40791
> URL: https://issues.apache.org/jira/browse/SPARK-40791
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 3.4.0
> Reporter: Yang Jie
> Priority: Major
>
> {code:java}
> val createSql =
> """
> |create temporary view v as select col from values
> | (timestamp '1582-06-01 11:33:33.123UTC+080000'),
> | (timestamp '1970-01-01 00:00:00.000Europe/Paris'),
> | (timestamp '1970-12-31 23:59:59.999Asia/Srednekolymsk'),
> | (timestamp '1996-04-01 00:33:33.123Australia/Darwin'),
> | (timestamp '2018-11-17 13:33:33.123Z'),
> | (timestamp '2020-01-01 01:33:33.123Asia/Shanghai'),
> | (timestamp '2100-01-01 01:33:33.123America/Los_Angeles') t(col)
> | """.stripMargin
> sql(createSql)
> withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
> val rows = sql("select col, date_format(col, 'F') from v").collect()
> // scalastyle:off
> rows.foreach(println)
> } {code}
>
> Before Java 19, the result is
>
> {code:java}
> [1582-05-31 19:40:35.123,3]
> [1969-12-31 15:00:00.0,3]
> [1970-12-31 04:59:59.999,3]
> [1996-03-31 07:03:33.123,3]
> [2018-11-17 05:33:33.123,3]
> [2019-12-31 09:33:33.123,3]
> [2100-01-01 01:33:33.123,1] {code}
> Java 19
>
> {code:java}
> [1582-05-31 19:40:35.123,5]
> [1969-12-31 15:00:00.0,5]
> [1970-12-31 04:59:59.999,5]
> [1996-03-31 07:03:33.123,5]
> [2018-11-17 05:33:33.123,3]
> [2019-12-31 09:33:33.123,5]
> [2100-01-01 01:33:33.123,1] {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]