Robert Kanter created OOZIE-2385:
------------------------------------

             Summary: 
org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming is flakey
                 Key: OOZIE-2385
                 URL: https://issues.apache.org/jira/browse/OOZIE-2385
             Project: Oozie
          Issue Type: Bug
          Components: tests
    Affects Versions: trunk
            Reporter: Robert Kanter
            Assignee: Robert Kanter
             Fix For: trunk


{{org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming}} is 
flakey.

It can fail with an error like this:
{noformat}
junit.framework.ComparisonFailure: expected:<Tue Oct 06 16:27:2[5] PDT 2015> 
but was:<Tue Oct 06 16:27:2[4] PDT 2015>
        at junit.framework.Assert.assertEquals(Assert.java:85)
        at junit.framework.Assert.assertEquals(Assert.java:91)
        at 
org.apache.oozie.TestCoordinatorEngineStreamLog.__CLR4_0_38acukov0f(TestCoordinatorEngineStreamLog.java:202)
        at 
org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming(TestCoordinatorEngineStreamLog.java:103)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at junit.framework.TestCase.runTest(TestCase.java:168)
        at junit.framework.TestCase.runBare(TestCase.java:134)
        at junit.framework.TestResult$1.protect(TestResult.java:110)
        at junit.framework.TestResult.runProtected(TestResult.java:128)
        at junit.framework.TestResult.run(TestResult.java:113)
        at junit.framework.TestCase.run(TestCase.java:124)
        at junit.framework.TestSuite.runTest(TestSuite.java:243)
        at junit.framework.TestSuite.run(TestSuite.java:238)
        at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
        at org.junit.runners.Suite.runChild(Suite.java:128)
        at org.junit.runners.Suite.runChild(Suite.java:24)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
{noformat}

This was tricky to track down.  The problem is essentially that the filtering 
API uses minute granularity but the actions are seconds, or even milliseconds 
apart, so there are some edge cases here.  The code takes minute granularity, 
goes back to milliseconds when querying the DB, and then uses seconds when 
checking that the test worked correctly.

This can lead to a situation like this example:
\@1 created/nominal time: Oct 06 16:27:24 <and some milliseconds>
\@2 created/nominal time: Oct 06 16:27:25 <and some milliseconds>

The test passes \@2's created time as the starting time argument to the filter, 
converted with {{DateUtill's}} UTC MASK (i.e. minutes granularity):
- Oct 06 16:27

This gets parsed back to a Date object, which seconds/milliseconds assumed to 
be 0's
- Oct 06 16:27:00 0000

This gets turned into a TimeStamp object, and used in the DB filtering query
- gets all actions with created time > Oct 06 16:27:00, which includes \@1

CoordinatorEngine sorts by action ID (i.e. \@#) and uses created time of 
smallest, which is \@1
- Oct 06 16:27:24

The test checks that \@2’s created time is equal to previously determined time 
from filter (i.e. \@1’s created time)
- Does a toString(), which uses seconds —-> not equal

Note: The filter querying is actually done against the nominal times, but the 
example is simpler without all that and conveys the same idea.

The actions get created within 100s of milliseconds during the test (at least 
on my machine), so most of them time, they have the same seconds, and the 
toString() comparison is equal.  When the timing works out that the first and 
second action are created at different clock seconds (even if less than a 
second apart), it fails.  I couldn’t reproduce this on my machine, but is more 
likely to happen on the generally slower Jenkins machines we run because the 
time between action creations will be longer.

Basically the problem is that the filtering works at a minute granularity but 
the actions are seconds, if not milliseconds, apart.  In the real world this 
isn’t a big deal because the API takes minutes and we simply use that, but to 
verify the test is working we should simply make the actions minutes apart.  To 
fix this, we can just directly insert the job and actions into the DB instead 
of actually running it so we can space the actions apart by minutes; this 
should also make the test faster as well (the test takes about 6 minutes on my 
machine as-is, but only 1 minute and 30 seconds with the fix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to