[jira] [Updated] (OOZIE-2385) org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming is flakey

Robert Kanter (JIRA) Wed, 07 Oct 2015 23:06:38 -0700

     [ 
https://issues.apache.org/jira/browse/OOZIE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Kanter updated OOZIE-2385:
---------------------------------
    Attachment: OOZIE-2385.001.patch

Patch makes the changes described in the JIRA description.  I also removed a 
bunch of no longer needed code and tweaked a few things.

> org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming is 
> flakey
> -------------------------------------------------------------------------------
>
>                 Key: OOZIE-2385
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2385
>             Project: Oozie
>          Issue Type: Bug
>          Components: tests
>    Affects Versions: trunk
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>             Fix For: trunk
>
>         Attachments: OOZIE-2385.001.patch
>
>
> {{org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming}} is 
> flakey.
> It can fail with an error like this:
> {noformat}
> junit.framework.ComparisonFailure: expected:<Tue Oct 06 16:27:2[5] PDT 2015> 
> but was:<Tue Oct 06 16:27:2[4] PDT 2015>
>       at junit.framework.Assert.assertEquals(Assert.java:85)
>       at junit.framework.Assert.assertEquals(Assert.java:91)
>       at 
> org.apache.oozie.TestCoordinatorEngineStreamLog.__CLR4_0_38acukov0f(TestCoordinatorEngineStreamLog.java:202)
>       at 
> org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming(TestCoordinatorEngineStreamLog.java:103)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at junit.framework.TestCase.runTest(TestCase.java:168)
>       at junit.framework.TestCase.runBare(TestCase.java:134)
>       at junit.framework.TestResult$1.protect(TestResult.java:110)
>       at junit.framework.TestResult.runProtected(TestResult.java:128)
>       at junit.framework.TestResult.run(TestResult.java:113)
>       at junit.framework.TestCase.run(TestCase.java:124)
>       at junit.framework.TestSuite.runTest(TestSuite.java:243)
>       at junit.framework.TestSuite.run(TestSuite.java:238)
>       at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
>       at org.junit.runners.Suite.runChild(Suite.java:128)
>       at org.junit.runners.Suite.runChild(Suite.java:24)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This was tricky to track down.  The problem is essentially that the filtering 
> API uses minute granularity but the actions are seconds, or even milliseconds 
> apart, so there are some edge cases here.  The code takes minute granularity, 
> goes back to milliseconds when querying the DB, and then uses seconds when 
> checking that the test worked correctly.
> This can lead to a situation like this example:
> \@1 created/nominal time: Oct 06 16:27:24 <and some milliseconds>
> \@2 created/nominal time: Oct 06 16:27:25 <and some milliseconds>
> The test passes \@2's created time as the starting time argument to the 
> filter, converted with {{DateUtill's}} UTC MASK (i.e. minutes granularity):
> - Oct 06 16:27
> This gets parsed back to a Date object, which seconds/milliseconds assumed to 
> be 0's
> - Oct 06 16:27:00 0000
> This gets turned into a TimeStamp object, and used in the DB filtering query
> - gets all actions with created time > Oct 06 16:27:00, which includes \@1
> CoordinatorEngine sorts by action ID (i.e. \@#) and uses created time of 
> smallest, which is \@1
> - Oct 06 16:27:24
> The test checks that \@2’s created time is equal to previously determined 
> time from filter (i.e. \@1’s created time)
> - Does a toString(), which uses seconds —-> not equal
> Note: The filter querying is actually done against the nominal times, but the 
> example is simpler without all that and conveys the same idea.
> The actions get created within 100s of milliseconds during the test (at least 
> on my machine), so most of them time, they have the same seconds, and the 
> toString() comparison is equal.  When the timing works out that the first and 
> second action are created at different clock seconds (even if less than a 
> second apart), it fails.  I couldn’t reproduce this on my machine, but is 
> more likely to happen on the generally slower Jenkins machines we run because 
> the time between action creations will be longer.
> Basically the problem is that the filtering works at a minute granularity but 
> the actions are seconds, if not milliseconds, apart.  In the real world this 
> isn’t a big deal because the API takes minutes and we simply use that, but to 
> verify the test is working we should simply make the actions minutes apart.  
> To fix this, we can just directly insert the job and actions into the DB 
> instead of actually running it so we can space the actions apart by minutes; 
> this should also make the test faster as well (the test takes about 6 minutes 
> on my machine as-is, but only 1 minute and 30 seconds with the fix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (OOZIE-2385) org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming is flakey

Reply via email to