[
https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315938#comment-15315938
]
Brett Randall commented on SPARK-15723:
---------------------------------------
Thanks for merging. And thanks for the Scala repl test - I can confirm that
this is driven by a combination of *both* default TimeZone and default Locale -
the default Locale impacts the interpretation of the short TZ code, which makes
sense.
{{Australia/Sydney/en_AU}} -> {color:red}*false*{color}
{noformat}
scala -J-Duser.timezone="Australia/Sydney" -J-Duser.country=AU <<EOF
val time = (new
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time == 1424470877190L
EOF
scala> val time = (new
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time: Long = 1424413277190
scala> time == 1424470877190L
res0: Boolean = false
{noformat}
{{Australia/Sydney/en_US}} -> {color:red}*false*{color}
{noformat}
scala -J-Duser.timezone="Australia/Sydney" -J-Duser.country=US <<EOF
val time = (new
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time == 1424470877190L
EOF
scala> val time = (new
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time: Long = 1424413277190
scala> time == 1424470877190L
res0: Boolean = false
{noformat}
{{America/New_York/en_US}} -> {color:green}*true*{color}
{noformat}
scala -J-Duser.timezone="America/New_York" -J-Duser.country=US <<EOF
val time = (new
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time == 1424470877190L
EOF
scala> val time = (new
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time: Long = 1424470877190
scala> time == 1424470877190L
res0: Boolean = true
{noformat}
So you were correct - this _can_ be disambiguated by applying a bias to the SDF
in the code, but this would be necessarily a fixed bias, and it has to be done
with a {{Calendar}} not a {{TimeZone}}:
{code}
sdf.setCalendar(Calendar.getInstance(TimeZone.getTimeZone("America/New_York"),
new Locale("en_US")))
{code}
I'm not certain this is better or more correct though, but it would remove any
ambiguity in the short TZ codes - could be documented - all short TZ codes are
evaluated as if they were in this default TZ/Locale. That might upset someone
deploying that wants {{MST}} = Malaysia Standard Time and not Mountain Time.
Make a note here if you think it is worth pursuing further, but I suspect we
just have to honour the local env defaults and discourage abbreviated TZs. And
the test fix is merged now, so all-good, thanks.
> SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ
> name
> ----------------------------------------------------------------------------------
>
> Key: SPARK-15723
> URL: https://issues.apache.org/jira/browse/SPARK-15723
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.6.1
> Reporter: Brett Randall
> Assignee: Brett Randall
> Priority: Minor
> Labels: test
> Fix For: 1.6.2, 2.0.0
>
>
> {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion:
> {code}
> new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be
> (1424470877190L)
> {code}
> This test is fragile and fails when executing in an environment where the
> local default timezone causes {{EST}} to be interpreted as something other
> than US Eastern Standard Time. If your local timezone is
> {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get:
> {noformat}
> date parsing *** FAILED ***
> 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29)
> {noformat}
> In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}}
> when interpreting short zone names. According to the {{TimeZone}} javadoc,
> they ought not be used:
> {quote}
> Three-letter time zone IDs
> For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such
> as "PST", "CTT", "AST") are also supported. However, their use is deprecated
> because the same abbreviation is often used for multiple time zones (for
> example, "CST" could be U.S. "Central Standard Time" and "China Standard
> Time"), and the Java platform can then only recognize one of them.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]