[ 
https://issues.apache.org/jira/browse/SPARK-40791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619653#comment-17619653
 ] 

Yang Jie edited comment on SPARK-40791 at 10/18/22 3:26 PM:
------------------------------------------------------------

Also cc [~srowen] [~rednaxelafx] [~dongjoon]  [~cloud_fan] [~Qin Yao] 
[~yumwang] 

At the same time, I found the following in the migration document:
 * In Spark 3.0, datetime pattern letter `F` is *{*}aligned day of week in 
month{*}* that represents the concept of the count of days within the period of 
a week where the weeks are aligned to the start of the month. In Spark version 
2.4 and earlier, it is *{*}week of month{*}* that represents the concept of the 
count of weeks within the month where weeks start on a fixed day-of-week, e.g. 
`2020-07-30` is 30 days (4 weeks and 2 days) after the first day of the month, 
so `date_format(date '2020-07-30', 'F')` returns 2 in Spark 3.0, but as a week 
count in Spark 2.x, it returns 5 because it locates in the 5th week of July 
2020, where week one is 2020-07-01 to 07-04.

 

It seems that the behavior before Spark 2.4 is correct. What do you think about 
it?

 


was (Author: luciferyang):
Also cc [~srowen] [~rednaxelafx] [~dongjoon]  [~cloud_fan] [~Qin Yao] 

 

At the same time, I found the following in the migration document:
 * In Spark 3.0, datetime pattern letter `F` is **aligned day of week in 
month** that represents the concept of the count of days within the period of a 
week where the weeks are aligned to the start of the month. In Spark version 
2.4 and earlier, it is **week of month** that represents the concept of the 
count of weeks within the month where weeks start on a fixed day-of-week, e.g. 
`2020-07-30` is 30 days (4 weeks and 2 days) after the first day of the month, 
so `date_format(date '2020-07-30', 'F')` returns 2 in Spark 3.0, but as a week 
count in Spark 2.x, it returns 5 because it locates in the 5th week of July 
2020, where week one is 2020-07-01 to 07-04.

 

It seems that the behavior before Spark 2.4 is correct. What do you think about 
it?

 

> The semantics of `F` in `DateTimeFormatter` have changed
> --------------------------------------------------------
>
>                 Key: SPARK-40791
>                 URL: https://issues.apache.org/jira/browse/SPARK-40791
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Yang Jie
>            Priority: Major
>
> {code:java}
> val createSql =
>   """
>     |create temporary view v as select col from values
>     | (timestamp '1582-06-01 11:33:33.123UTC+080000'),
>     | (timestamp '1970-01-01 00:00:00.000Europe/Paris'),
>     | (timestamp '1970-12-31 23:59:59.999Asia/Srednekolymsk'),
>     | (timestamp '1996-04-01 00:33:33.123Australia/Darwin'),
>     | (timestamp '2018-11-17 13:33:33.123Z'),
>     | (timestamp '2020-01-01 01:33:33.123Asia/Shanghai'),
>     | (timestamp '2100-01-01 01:33:33.123America/Los_Angeles') t(col)
>     | """.stripMargin
> sql(createSql)
> withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
>   val rows = sql("select col, date_format(col, 'F') from v").collect()
>   // scalastyle:off
>   rows.foreach(println)
> } {code}
>  
> Before Java 19, the result is 
>  
> {code:java}
> [1582-05-31 19:40:35.123,3]
> [1969-12-31 15:00:00.0,3]
> [1970-12-31 04:59:59.999,3]
> [1996-03-31 07:03:33.123,3]
> [2018-11-17 05:33:33.123,3]
> [2019-12-31 09:33:33.123,3]
> [2100-01-01 01:33:33.123,1] {code}
> Java 19
>  
> {code:java}
> [1582-05-31 19:40:35.123,5]
> [1969-12-31 15:00:00.0,5]
> [1970-12-31 04:59:59.999,5]
> [1996-03-31 07:03:33.123,5]
> [2018-11-17 05:33:33.123,3]
> [2019-12-31 09:33:33.123,5]
> [2100-01-01 01:33:33.123,1] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to