[
https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315725#comment-15315725
]
Brett Randall commented on SPARK-15723:
---------------------------------------
Unfortunately I don't think that will help in this case, or any other case
where there is ambiguity for a client-supplied short-code TZ name
as-interpreted by a certain server-side local default timezone, or as shown in
this case, in a test.
{{SimpleDateFormat}} parsing has some unusual side-effects on its underlying
{{Calendar}} instance, which are triggered by a {{parse()}}. There is a
warning of this behaviour in the JavaDoc for
{{java.text.DateFormat.setTimeZone(TimeZone)}}:
{quote}
The {{TimeZone}} set by this method may be overwritten as a result of a call to
the parse method.
{quote}
If the SDF's current {{Calendar}} has a {{TimeZone}} (which is either
{{TimeZone.getDefault()}} or as later set as is done in the second attempt in
{{SimpleDateParam}}) which does not match a timezone that is decoded from a
short-code TZ in the parsed string, SDF makes the assumption that the best
"local" interpretation of the abbreviation is the correct one, and updates its
local {{Calendar}} instance to the new TZ. So in my test-case, if my
{{TimeZone.getDefault()}} is {{Australia/Sydney}}, which has a short-name alias
of {{EST}}, when {{EST}} is encountered in the parsed date string, SDF says
"they must mean Eastern (Australian) Standard Time, I'll interpret as that and
update my calendar", whereas that code running with a different default TZ will
pick {{America/New_York}} or {{-0500}}. It does this during the parsing and
before the computation of the datetime, so the result is in the guessed
timezone, not any timezone you pre-seed with {{setTimeZone()}} or even
{{setCalendar()}} - they will always be overwritten if the TZ is parsed. You
could block the update in an SDF sub-class, but you are still left with the
same problem - how should {{EST}}, which is necessarily ambiguous according to
the old list of short codes, be interpreted?
The JDK code is in method
{{java.text.SimpleDateFormat.subParseZoneString\(String, int,
CalendarBuilder\)}}, where it can be seen to call {{setTimeZone\(...\)}} on
itself if it parses a TZ that differs from the existing SDF one.
So any pre-setting of a {{TimeZone}} or {{Calendar}} is moot if you allow the
short-TZ code to be parsed - it is ambiguous for many values, but it is the
caller's fault for using a deprecated short TZ form. The result completely
depends on {{TimeZone.getDefault()}}, which you don't want to change or
rely-on. It is unfortunate that SDF does this, but these short-forms are
completely deprecated anyway. Clients should not be using them - they should
be using an RFC822 form, which avoids ambiguity.
> SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ
> name
> ----------------------------------------------------------------------------------
>
> Key: SPARK-15723
> URL: https://issues.apache.org/jira/browse/SPARK-15723
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.6.1
> Reporter: Brett Randall
> Priority: Minor
> Labels: test
>
> {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion:
> {code}
> new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be
> (1424470877190L)
> {code}
> This test is fragile and fails when executing in an environment where the
> local default timezone causes {{EST}} to be interpreted as something other
> than US Eastern Standard Time. If your local timezone is
> {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get:
> {noformat}
> date parsing *** FAILED ***
> 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29)
> {noformat}
> In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}}
> when interpreting short zone names. According to the {{TimeZone}} javadoc,
> they ought not be used:
> {quote}
> Three-letter time zone IDs
> For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such
> as "PST", "CTT", "AST") are also supported. However, their use is deprecated
> because the same abbreviation is often used for multiple time zones (for
> example, "CST" could be U.S. "Central Standard Time" and "China Standard
> Time"), and the Java platform can then only recognize one of them.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]