[ 
https://issues.apache.org/jira/browse/SPARK-32683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184031#comment-17184031
 ] 

Daeho Ro edited comment on SPARK-32683 at 8/25/20, 1:27 PM:
------------------------------------------------------------

I did not mean to change the doc but the source or recover the DateFormatter W, 
anyway, the function is gone (even before) and the documentation is now clear, 
not confused. 


was (Author: lamanus):
I did not mean to change the doc but the source or recover the DateFormatter W 
but anyway, the function is gone (even before) and the documentation is now 
clear, not confused. 

> Datetime Pattern F not working as expected
> ------------------------------------------
>
>                 Key: SPARK-32683
>                 URL: https://issues.apache.org/jira/browse/SPARK-32683
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>         Environment: Windows 10 Pro
>  * with Jupyter Lab - Docker Image 
>  ** jupyter/all-spark-notebook:f1811928b3dd 
>  *** spark 3.0.0
>  *** python 3.8.5
>  *** openjdk 11.0.8
>            Reporter: Daeho Ro
>            Assignee: Kent Yao
>            Priority: Major
>             Fix For: 3.0.1, 3.1.0
>
>         Attachments: comment.png
>
>
> h3. Background
> From the 
> [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html],
>  the pattern F should give a week of the month.
> |*Symbol*|*Meaning*|*Presentation*|*Example*|
> |F|week-of-month|number(1)|3|
> h3. Test Data
> Here is my test data, that is a csv file.
> {code:java}
> date
> 2020-08-01
> 2020-08-02
> 2020-08-03
> 2020-08-04
> 2020-08-05
> 2020-08-06
> 2020-08-07
> 2020-08-08
> 2020-08-09
> 2020-08-10 {code}
> h3. Steps to the bug
> I have tested in the scala spark 3.0.0 and pyspark 3.0.0:
> {code:java}
> // Spark
> df.withColumn("date", to_timestamp('date, "yyyy-MM-dd"))
>   .withColumn("week", date_format('date, "F")).show
> +-------------------+----+
> |               date|week|
> +-------------------+----+
> |2020-08-01 00:00:00|   1|
> |2020-08-02 00:00:00|   2|
> |2020-08-03 00:00:00|   3|
> |2020-08-04 00:00:00|   4|
> |2020-08-05 00:00:00|   5|
> |2020-08-06 00:00:00|   6|
> |2020-08-07 00:00:00|   7|
> |2020-08-08 00:00:00|   1|
> |2020-08-09 00:00:00|   2|
> |2020-08-10 00:00:00|   3|
> +-------------------+----+
> # pyspark
> df.withColumn('date', to_timestamp('date', 'yyyy-MM-dd')) \
>   .withColumn('week', date_format('date', 'F')) \
>   .show(10, False)
> +-------------------+----+
> |date               |week|
> +-------------------+----+
> |2020-08-01 00:00:00|1   |
> |2020-08-02 00:00:00|2   |
> |2020-08-03 00:00:00|3   |
> |2020-08-04 00:00:00|4   |
> |2020-08-05 00:00:00|5   |
> |2020-08-06 00:00:00|6   |
> |2020-08-07 00:00:00|7   |
> |2020-08-08 00:00:00|1   |
> |2020-08-09 00:00:00|2   |
> |2020-08-10 00:00:00|3   |
> +-------------------+----+{code}
> h3. Expected result
> The `week` column is not the week of the month. It is a day of the week as a 
> number.
>   !comment.png!
> From my calendar, the first day of August should have 1 for the week-of-month 
> and from 2nd to 8th should have 2 and so on.
> {code:java}
> +-------------------+----+
> |date               |week|
> +-------------------+----+
> |2020-08-01 00:00:00|1   |
> |2020-08-02 00:00:00|2   |
> |2020-08-03 00:00:00|2   |
> |2020-08-04 00:00:00|2   |
> |2020-08-05 00:00:00|2   |
> |2020-08-06 00:00:00|2   |
> |2020-08-07 00:00:00|2   |
> |2020-08-08 00:00:00|2   |
> |2020-08-09 00:00:00|3   |
> |2020-08-10 00:00:00|3   |
> +-------------------+----+{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to