[
https://issues.apache.org/jira/browse/OOZIE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Kanter updated OOZIE-2385:
---------------------------------
Attachment: OOZIE-2385.001.patch
Patch makes the changes described in the JIRA description. I also removed a
bunch of no longer needed code and tweaked a few things.
> org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming is
> flakey
> -------------------------------------------------------------------------------
>
> Key: OOZIE-2385
> URL: https://issues.apache.org/jira/browse/OOZIE-2385
> Project: Oozie
> Issue Type: Bug
> Components: tests
> Affects Versions: trunk
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Fix For: trunk
>
> Attachments: OOZIE-2385.001.patch
>
>
> {{org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming}} is
> flakey.
> It can fail with an error like this:
> {noformat}
> junit.framework.ComparisonFailure: expected:<Tue Oct 06 16:27:2[5] PDT 2015>
> but was:<Tue Oct 06 16:27:2[4] PDT 2015>
> at junit.framework.Assert.assertEquals(Assert.java:85)
> at junit.framework.Assert.assertEquals(Assert.java:91)
> at
> org.apache.oozie.TestCoordinatorEngineStreamLog.__CLR4_0_38acukov0f(TestCoordinatorEngineStreamLog.java:202)
> at
> org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming(TestCoordinatorEngineStreamLog.java:103)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at junit.framework.TestCase.runTest(TestCase.java:168)
> at junit.framework.TestCase.runBare(TestCase.java:134)
> at junit.framework.TestResult$1.protect(TestResult.java:110)
> at junit.framework.TestResult.runProtected(TestResult.java:128)
> at junit.framework.TestResult.run(TestResult.java:113)
> at junit.framework.TestCase.run(TestCase.java:124)
> at junit.framework.TestSuite.runTest(TestSuite.java:243)
> at junit.framework.TestSuite.run(TestSuite.java:238)
> at
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
> at org.junit.runners.Suite.runChild(Suite.java:128)
> at org.junit.runners.Suite.runChild(Suite.java:24)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This was tricky to track down. The problem is essentially that the filtering
> API uses minute granularity but the actions are seconds, or even milliseconds
> apart, so there are some edge cases here. The code takes minute granularity,
> goes back to milliseconds when querying the DB, and then uses seconds when
> checking that the test worked correctly.
> This can lead to a situation like this example:
> \@1 created/nominal time: Oct 06 16:27:24 <and some milliseconds>
> \@2 created/nominal time: Oct 06 16:27:25 <and some milliseconds>
> The test passes \@2's created time as the starting time argument to the
> filter, converted with {{DateUtill's}} UTC MASK (i.e. minutes granularity):
> - Oct 06 16:27
> This gets parsed back to a Date object, which seconds/milliseconds assumed to
> be 0's
> - Oct 06 16:27:00 0000
> This gets turned into a TimeStamp object, and used in the DB filtering query
> - gets all actions with created time > Oct 06 16:27:00, which includes \@1
> CoordinatorEngine sorts by action ID (i.e. \@#) and uses created time of
> smallest, which is \@1
> - Oct 06 16:27:24
> The test checks that \@2’s created time is equal to previously determined
> time from filter (i.e. \@1’s created time)
> - Does a toString(), which uses seconds —-> not equal
> Note: The filter querying is actually done against the nominal times, but the
> example is simpler without all that and conveys the same idea.
> The actions get created within 100s of milliseconds during the test (at least
> on my machine), so most of them time, they have the same seconds, and the
> toString() comparison is equal. When the timing works out that the first and
> second action are created at different clock seconds (even if less than a
> second apart), it fails. I couldn’t reproduce this on my machine, but is
> more likely to happen on the generally slower Jenkins machines we run because
> the time between action creations will be longer.
> Basically the problem is that the filtering works at a minute granularity but
> the actions are seconds, if not milliseconds, apart. In the real world this
> isn’t a big deal because the API takes minutes and we simply use that, but to
> verify the test is working we should simply make the actions minutes apart.
> To fix this, we can just directly insert the job and actions into the DB
> instead of actually running it so we can space the actions apart by minutes;
> this should also make the test faster as well (the test takes about 6 minutes
> on my machine as-is, but only 1 minute and 30 seconds with the fix).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)