Robert Kanter created OOZIE-2155:
------------------------------------
Summary: Incorrect DST Shifts are occurring based on the Database
timezone
Key: OOZIE-2155
URL: https://issues.apache.org/jira/browse/OOZIE-2155
Project: Oozie
Issue Type: Bug
Components: core
Affects Versions: 4.1.0, trunk
Reporter: Robert Kanter
Priority: Blocker
We discovered a critical bug where incorrect Daylight Saving Time shifts were
occurring based on the Database timezone.
Try running this Coordinator:
{code:xml}
<coordinator-app name=“foo” frequency="${coord:hours(1)}"
start="2014-11-02T04:15Z" end="2014-11-02T16:00Z" timezone=“UTC”
xmlns="uri:oozie:coordinator:0.1">
<controls>
<concurrency>10</concurrency>
</controls>
<action>
<workflow>
<app-path>${appPath}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
{code}
Note that it runs over a DST shift (at least in most US timezones).
Here's a sample of some of the actions, along with their Nominal Times:
{noformat}
0000013-150212140838307-oozie-rkan-C@4 2014-11-02 07:15 GMT
0000013-150212140838307-oozie-rkan-C@6 2014-11-02 09:15 GMT
0000013-150212140838307-oozie-rkan-C@5 2014-11-02 09:15 GMT
0000013-150212140838307-oozie-rkan-C@7 2014-11-02 10:15 GMT
{noformat}
Note that actions 5 and 6 have the same time. This is incorrect because the
times are in GMT (with no DST). I've also confirmed via a debugger that these
dates have the same number of seconds since epoch (so it's not just a rendering
issue).
By the way, if you're in a different timezone, you'll see this problem occur on
different actions because the issue is related to the timezone that your
database is in. It depends on when the DST shift occurs in your timezone; for
example, in "America/New_York", it happens with actions 2 and 3.
On that note, if I ask Oozie to use "America/Los_Angeles" to print the dates, I
get this:
{noformat}
0000013-150212140838307-oozie-rkan-C@4 2014-11-02 00:15 PDT
0000013-150212140838307-oozie-rkan-C@6 2014-11-02 01:15 PST
0000013-150212140838307-oozie-rkan-C@5 2014-11-02 01:15 PST
0000013-150212140838307-oozie-rkan-C@7 2014-11-02 02:15 PST
{noformat}
Action 5's nominal time should be {{2014-11-02 01:15 PDT}}, not {{2014-11-02
01:15 PST}}.
Using the debugger some more, I verified that Oozie is creating the nominal
times correctly, and writing them to the database correctly (at least, it's
converting them to Java's SQL TimeStamp objects correctly; OpenJPA handles
writing them). But when the problematic value is read back from the database,
it has the wrong value!
Here's something interesting from the [MySQL
documentation|http://dev.mysql.com/doc/refman/5.5/en/datetime.html]:
{quote}MySQL converts TIMESTAMP values from the current time zone to UTC for
storage, and back from UTC to the current time zone for retrieval. (This does
not occur for other types such as DATETIME.) By default, the current time zone
for each connection is the server's time. The time zone can be set on a
per-connection basis. As long as the time zone setting remains constant, you
get back the same value you store. If you store a TIMESTAMP value, and then
change the time zone and retrieve the value, the retrieved value is different
from the value you stored. This occurs because the same time zone was not used
for conversion in both directions.
{quote}
So, I think what's happening is that it's interpreting everything in PST, and
not in PST and PDT depending on the time itself. Ideally, it would just store
the time since epoch, like Java's Date does, but it's doing this broken
interpretation instead. In fact, this [Stack
Overflow|http://stackoverflow.com/questions/1646171/mysql-datetime-fields-and-daylight-savings-time-how-do-i-reference-the-extra/1650910#1650910]
I found talks about this problem.
I tried to create a unit test to show the problem without all the action stuff,
but it seems like HSQLDB doesn't have this problem, so the test passes. I've
confirmed that this affects Derby and MySQL; I'm not sure about Postgres,
Oracle, or SQLServer.
I was able to find a workaround for Derby and one of my colleagues found one
for MySQL:
- For Derby, you just need to change the JVM's timezone to GMT (see
[here|http://objectmix.com/apache/647950-moving-derby-database-across-timezones.html]).
All you have to do is add {{-Duser.timezone=GMT}} to {{CATALINA_OPTS}}.
- For MySQL, you can either change the global timezone to GMT (which the DB
admin probably won't go for), or you can add
{{useLegacyDatetimeCode=false&serverTimezone=GMT}} to JDBC URL. For example:
{code:xml}
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://HOST/oozie?useUnicode=true&characterEncoding=UTF-8&useLegacyDatetimeCode=false&serverTimezone=GMT</value>
</property>
{code}
I imagine the Derby workaround won't work for anything else, but perhaps the
MySQL workaround would?
The easiest way to fix this is to add the {{user.timezone}} thing (to fix
Derby), and have Oozie always add the JDBC configs to the JDBC URL (for the
others). If Oracle, Postgres, and SQLServer don't support that, we'll have to
come up with other ideas. The other ideas I had were to replace TIMESTAMP with
DATETIME or the SQL equivalent of {{long}}; though these have the downside of
being schema changes and possibly other side effects.
I have some homework for everyone :)
Please check if the DB type listed below is affected by this, and if so, does
the above workaround (or some similar configs) fix it?
- [~bowenzhangusa]: SQLServer
- [~chitnis], [~puru], [~rohini], or [~egashira]: Oracle
- Me: Postgres
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)