[ 
https://issues.apache.org/jira/browse/SPARK-32683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182643#comment-17182643
 ] 

Varun Wachasapti J commented on SPARK-32683:
--------------------------------------------

Stumbled upon the same issue, upon initial investigation looks like change in 
behaviour in java.lang.DateTimeFormatterBuilder.
Literal character 'F' currently maps to 
ChronoField.ALIGNED_DAY_OF_WEEK_IN_MONTH instead of 
ChronoField.ALIGNED_WEEK_OF_MONTH

According to ALIGNED_DAY_OF_WEEK_IN_MONTH definition it formats the timestamp 
to give the count of the date within that week. Defintion - 
[https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/time/temporal/ChronoField.java#L373]



Tried the same snippet in Spark 2.4.6, and it's behaviour was inline with the 
spark documentation. This looks like a regression for Spark 3.x. 

> Datetime Pattern F not working as expected
> ------------------------------------------
>
>                 Key: SPARK-32683
>                 URL: https://issues.apache.org/jira/browse/SPARK-32683
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.0.0
>         Environment: Windows 10 Pro
>  * with Jupyter Lab - Docker Image 
>  ** jupyter/all-spark-notebook:f1811928b3dd 
>  *** spark 3.0.0
>  *** python 3.8.5
>  *** openjdk 11.0.8
>            Reporter: Daeho Ro
>            Priority: Major
>         Attachments: comment.png
>
>
> h3. Background
> From the 
> [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html],
>  the pattern F should give a week of the month.
> |*Symbol*|*Meaning*|*Presentation*|*Example*|
> |F|week-of-month|number(1)|3|
> h3. Test Data
> Here is my test data, that is a csv file.
> {code:java}
> date
> 2020-08-01
> 2020-08-02
> 2020-08-03
> 2020-08-04
> 2020-08-05
> 2020-08-06
> 2020-08-07
> 2020-08-08
> 2020-08-09
> 2020-08-10 {code}
> h3. Steps to the bug
> I have tested in the scala spark 3.0.0 and pyspark 3.0.0:
> {code:java}
> // Spark
> df.withColumn("date", to_timestamp('date, "yyyy-MM-dd"))
>   .withColumn("week", date_format('date, "F")).show
> +-------------------+----+
> |               date|week|
> +-------------------+----+
> |2020-08-01 00:00:00|   1|
> |2020-08-02 00:00:00|   2|
> |2020-08-03 00:00:00|   3|
> |2020-08-04 00:00:00|   4|
> |2020-08-05 00:00:00|   5|
> |2020-08-06 00:00:00|   6|
> |2020-08-07 00:00:00|   7|
> |2020-08-08 00:00:00|   1|
> |2020-08-09 00:00:00|   2|
> |2020-08-10 00:00:00|   3|
> +-------------------+----+
> # pyspark
> df.withColumn('date', to_timestamp('date', 'yyyy-MM-dd')) \
>   .withColumn('week', date_format('date', 'F')) \
>   .show(10, False)
> +-------------------+----+
> |date               |week|
> +-------------------+----+
> |2020-08-01 00:00:00|1   |
> |2020-08-02 00:00:00|2   |
> |2020-08-03 00:00:00|3   |
> |2020-08-04 00:00:00|4   |
> |2020-08-05 00:00:00|5   |
> |2020-08-06 00:00:00|6   |
> |2020-08-07 00:00:00|7   |
> |2020-08-08 00:00:00|1   |
> |2020-08-09 00:00:00|2   |
> |2020-08-10 00:00:00|3   |
> +-------------------+----+{code}
> h3. Expected result
> The `week` column is not the week of the month. It is a day of the week as a 
> number.
>   !comment.png!
> From my calendar, the first day of August should have 1 for the week-of-month 
> and from 2nd to 8th should have 2 and so on.
> {code:java}
> +-------------------+----+
> |date               |week|
> +-------------------+----+
> |2020-08-01 00:00:00|1   |
> |2020-08-02 00:00:00|2   |
> |2020-08-03 00:00:00|2   |
> |2020-08-04 00:00:00|2   |
> |2020-08-05 00:00:00|2   |
> |2020-08-06 00:00:00|2   |
> |2020-08-07 00:00:00|2   |
> |2020-08-08 00:00:00|2   |
> |2020-08-09 00:00:00|3   |
> |2020-08-10 00:00:00|3   |
> +-------------------+----+{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to