Kent Yao created SPARK-31879:
--------------------------------

             Summary: First day of week changed for non-MONDAY_START Lacales
                 Key: SPARK-31879
                 URL: https://issues.apache.org/jira/browse/SPARK-31879
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 3.0.0, 3.1.0
            Reporter: Kent Yao


h1. cases
{code:sql}
spark-sql> select to_timestamp('2020-1-1', 'YYYY-w-u');
2019-12-29 00:00:00
spark-sql> set spark.sql.legacy.timeParserPolicy=legacy;
spark.sql.legacy.timeParserPolicy       legacy
spark-sql> select to_timestamp('2020-1-1', 'YYYY-w-u');
2019-12-30 00:00:00
{code}

h1. reasons

These week-based fields need Locale to express their semantics, the first day 
of the week varies from country to country.

>From the Java doc of WeekFields
{code:java}
    /**
     * Gets the first day-of-week.
     * <p>
     * The first day-of-week varies by culture.
     * For example, the US uses Sunday, while France and the ISO-8601 standard 
use Monday.
     * This method returns the first day using the standard {@code DayOfWeek} 
enum.
     *
     * @return the first day-of-week, not null
     */
    public DayOfWeek getFirstDayOfWeek() {
        return firstDayOfWeek;
    }
{code}

But for the SimpleDateFormat, the day-of-week is not localized

```
u       Day number of week (1 = Monday, ..., 7 = Sunday)        Number  1
```

Currently, the default locale we use is the US, so the result moved a day 
backward.

For other countries, please refer to [First Day of the Week in Different 
Countries|http://chartsbin.com/view/41671]

h1. solution options

1. Use new Locale("en", "GB") as default locale.
2. For JDK10 and onwards, we can set locale Unicode extension 'fw'  to 'mon', 
but not work for lower JDKs
3. Forbid 'u', give user proper exceptions, and enable and document 'e/c'. 
Currently, the 'u' is internally substituted by 'e', but they are not 
equivalent.

1 and 2 can solve this with default locale but not for the functions with 
custom locale supported.

cc [~cloud_fan] [~dongjoon] [~maropu]





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to