[ 
https://issues.apache.org/jira/browse/SPARK-24969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentino Pinna updated SPARK-24969:
------------------------------------
    Description: 
The locale for {{org.apache.spark.sql.catalyst.util.DateTimeUtils}}, that is 
internally used by {{to_date}} SQL function, is set in code to be {{Locale.US}}.

This causes problems parsing a dataset which has dates in a different (italian 
in this case) language.
{code:java}
spark.read.format("csv")
            .option("sep", ";")
            .csv(logFile)
            .toDF("DATA", .....)
            .withColumn("DATA2", to_date(col("DATA"), "yyyy MMM"))
            .show(10)
{code}
Results from example dataset:
|*DATA*|*DATA2*|
|2018 giu|null|
|2018 mag|null|
|2018 apr|2018-04-01|
|2018 mar|2018-03-01|
|2018 feb|2018-02-01|
|2018 gen|null|
|2017 dic|null|
|2017 nov|2017-11-01|
|2017 ott|null|
|2017 set|null|

Expected results: All values converted.

TEMPORARY WORKAROUND:

In object {{org.apache.spark.sql.catalyst.util.DateTimeUtils}}, replace all 
instances of {{Locale.US}} with {{Locale.<your locale>}}

ADDITIONAL NOTES:

I can make a pull request available on GitHub.

  was:
The locale for DateTimeUtils, that is internally used by to_date SQL function, 
is set in code to be Locale.US.

This causes problems parsing a dataset which has dates in a different (italian 
in this case) language.
{code}
spark.read.format("csv")
            .option("sep", ";")
            .csv(logFile)
            .toDF("DATA", .....)
            .withColumn("DATA2", to_date(col("DATA"), "yyyy MMM"))
            .show(10)
{code}
Results from example dataset:
|*DATA*|*DATA2*|
|2018 giu|null|
|2018 mag|null|
|2018 apr|2018-04-01|
|2018 mar|2018-03-01|
|2018 feb|2018-02-01|
|2018 gen|null|
|2017 dic|null|
|2017 nov|2017-11-01|
|2017 ott|null|
|2017 set|null|

Expected results: All values converted.

TEMPORARY WORKAROUND:

In object {{org.apache.spark.sql.catalyst.util.DateTimeUtils}}, replace all 
instances of {{Locale.US}} with {{Locale.<your locale>}}

ADDITIONAL NOTES:

I can make a pull request available on GitHub.


> SQL: to_date function can't parse date strings in different locales.
> --------------------------------------------------------------------
>
>                 Key: SPARK-24969
>                 URL: https://issues.apache.org/jira/browse/SPARK-24969
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.1
>         Environment: Bare Spark 2.2.1 installation, on RHEL 6.
>            Reporter: Valentino Pinna
>            Priority: Major
>
> The locale for {{org.apache.spark.sql.catalyst.util.DateTimeUtils}}, that is 
> internally used by {{to_date}} SQL function, is set in code to be 
> {{Locale.US}}.
> This causes problems parsing a dataset which has dates in a different 
> (italian in this case) language.
> {code:java}
> spark.read.format("csv")
>             .option("sep", ";")
>             .csv(logFile)
>             .toDF("DATA", .....)
>             .withColumn("DATA2", to_date(col("DATA"), "yyyy MMM"))
>             .show(10)
> {code}
> Results from example dataset:
> |*DATA*|*DATA2*|
> |2018 giu|null|
> |2018 mag|null|
> |2018 apr|2018-04-01|
> |2018 mar|2018-03-01|
> |2018 feb|2018-02-01|
> |2018 gen|null|
> |2017 dic|null|
> |2017 nov|2017-11-01|
> |2017 ott|null|
> |2017 set|null|
> Expected results: All values converted.
> TEMPORARY WORKAROUND:
> In object {{org.apache.spark.sql.catalyst.util.DateTimeUtils}}, replace all 
> instances of {{Locale.US}} with {{Locale.<your locale>}}
> ADDITIONAL NOTES:
> I can make a pull request available on GitHub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to