[ 
https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315725#comment-15315725
 ] 

Brett Randall commented on SPARK-15723:
---------------------------------------

Unfortunately I don't think that will help in this case, or any other case 
where there is ambiguity for a client-supplied short-code TZ name 
as-interpreted by a certain server-side local default timezone, or as shown in 
this case, in a test.

{{SimpleDateFormat}} parsing has some unusual side-effects on its underlying 
{{Calendar}} instance, which are triggered by a {{parse()}}.  There is a 
warning of this behaviour in the JavaDoc for 
{{java.text.DateFormat.setTimeZone(TimeZone)}}:

{quote}
The {{TimeZone}} set by this method may be overwritten as a result of a call to 
the parse method.
{quote}

If the SDF's current {{Calendar}} has a {{TimeZone}} (which is either 
{{TimeZone.getDefault()}} or as later set as is done in the second attempt in 
{{SimpleDateParam}}) which does not match a timezone that is decoded from a 
short-code TZ in the parsed string, SDF makes the assumption that the best 
"local" interpretation of the abbreviation is the correct one, and updates its 
local {{Calendar}} instance to the new TZ.  So in my test-case, if my 
{{TimeZone.getDefault()}} is {{Australia/Sydney}}, which has a short-name alias 
of {{EST}}, when {{EST}} is encountered in the parsed date string, SDF says 
"they must mean Eastern (Australian) Standard Time, I'll interpret as that and 
update my calendar", whereas that code running with a different default TZ will 
pick {{America/New_York}} or {{-0500}}.  It does this during the parsing and 
before the computation of the datetime, so the result is in the guessed 
timezone, not any timezone you pre-seed with {{setTimeZone()}} or even 
{{setCalendar()}} - they will always be overwritten if the TZ is parsed.  You 
could block the update in an SDF sub-class, but you are still left with the 
same problem - how should {{EST}}, which is necessarily ambiguous according to 
the old list of short codes, be interpreted?

The JDK code is in method 
{{java.text.SimpleDateFormat.subParseZoneString\(String, int, 
CalendarBuilder\)}}, where it can be seen to call {{setTimeZone\(...\)}} on 
itself if it parses a TZ that differs from the existing SDF one.

So any pre-setting of a {{TimeZone}} or {{Calendar}} is moot if you allow the 
short-TZ code to be parsed - it is ambiguous for many values, but it is the 
caller's fault for using a deprecated short TZ form.  The result completely 
depends on {{TimeZone.getDefault()}}, which you don't want to change or 
rely-on.  It is unfortunate that SDF does this, but these short-forms are 
completely deprecated anyway.  Clients should not be using them - they should 
be using an RFC822 form, which avoids ambiguity.


> SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ 
> name
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-15723
>                 URL: https://issues.apache.org/jira/browse/SPARK-15723
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.1
>            Reporter: Brett Randall
>            Priority: Minor
>              Labels: test
>
> {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion:
> {code}
>     new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be 
> (1424470877190L)
> {code}
> This test is fragile and fails when executing in an environment where the 
> local default timezone causes {{EST}} to be interpreted as something other 
> than US Eastern Standard Time.  If your local timezone is 
> {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get:
> {noformat}
> date parsing *** FAILED ***
> 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29)
> {noformat}
> In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} 
> when interpreting short zone names.  According to the {{TimeZone}} javadoc, 
> they ought not be used:
> {quote}
> Three-letter time zone IDs
> For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such 
> as "PST", "CTT", "AST") are also supported. However, their use is deprecated 
> because the same abbreviation is often used for multiple time zones (for 
> example, "CST" could be U.S. "Central Standard Time" and "China Standard 
> Time"), and the Java platform can then only recognize one of them.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to