I think the problem might be related to the way the test is constructed. The test submits a job to the JM and then tries to poll the accumulators from the JM. If it does not succeed, then the polling is retried with an decreasing pause in between. Furthermore, the task which updates the accumulators also sleeps for the same period until it reads the next element and updates the accumulators.
Since the test does not use an explicit synchronization but instead relies on sleeps, it will most likely exhibit a flakey behaviour. Sleeps don't work reliable enough, especially on Travis, to guarantee a certain thread interleaving. I'd recommend introducing explicit synchronization mechanism which control the behaviour of the accumulator producing task and explicit testing messages which indicate that a new accumulator value has arrived at the JM. Cheers, Till On Thu, Jul 16, 2015 at 11:04 PM, Matthias J. Sax < [email protected]> wrote: > Hi, > > the test still fails. This time in both runs (Flink Travis and my own > Travis) -- only for Java 8 again: > > https://travis-ci.org/apache/flink/jobs/71314132 > https://travis-ci.org/mjsax/flink/jobs/71179608 > > -Matthias > > > On 07/16/2015 02:28 PM, Matthias J. Sax wrote: > > Great! I will. As 4 of 5 runs succeeded I cannot test explicitly. Will > > have an eye on it in future runs. > > > > -Matthias > > > > > > On 07/16/2015 02:24 PM, Maximilian Michels wrote: > >> Hi Matthias, > >> > >> I've pushed a fix to the master. The problem should be solved. Please > tell > >> me if your Travis reports an error again. My Travis never complained :) > >> > >> Cheers, > >> Max > >> > >> On Thu, Jul 16, 2015 at 12:00 PM, Maximilian Michels <[email protected]> > wrote: > >> > >>> Hi Matthias, > >>> > >>> This is indeed a timing issue when checking for the results in this > test. > >>> The new accumulator implementation now continuously reports from the > >>> running tasks to the job manager. This was merged yesterday. > >>> > >>> The assertion that fails there is a bit strict. Actually, I've already > >>> integrated a retry mechanism that fails only if the assertions don't > hold > >>> for a configured number of times. > >>> > >>> I'll commit a fix to the master. Thanks for reporting! > >>> > >>> Cheers, > >>> Max > >>> > >>> On Thu, Jul 16, 2015 at 11:33 AM, Ufuk Celebi <[email protected]> wrote: > >>> > >>>> Hey, > >>>> > >>>> this has been merged yesterday. I guess it's a timing issue when > >>>> verifying the results. Can you file an issue for this? > >>>> > >>>> – Ufuk > >>>> > >>>> On 16 Jul 2015, at 11:30, Matthias J. Sax < > [email protected]> > >>>> wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> I hit another failing test (that is new to me): > >>>>> > >>>>>> Results : > >>>>>> Failed tests: > >>>>>> > >>>> > AccumulatorLiveITCase.testProgram:106->access$1100:68->checkFlinkAccumulators:189 > >>>> null > >>>>> > >>>>> > >>>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 8.694 > >>>> sec <<< FAILURE! - in > >>>> org.apache.flink.test.accumulators.AccumulatorLiveITCase > >>>>>> > testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase) > >>>> Time elapsed: 8.021 sec <<< FAILURE! > >>>>>> java.lang.AssertionError: null > >>>>>> at org.junit.Assert.fail(Assert.java:86) > >>>>>> at org.junit.Assert.assertTrue(Assert.java:41) > >>>>>> at org.junit.Assert.assertTrue(Assert.java:52) > >>>>>> at > >>>> > org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189) > >>>>>> at > >>>> > org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68) > >>>>> > >>>>> Please see: https://travis-ci.org/mjsax/flink/jobs/71179608 > >>>>> > >>>>> Does anyone know anything about it? > >>>>> > >>>>> BTW: Even if this test is in flink-tests, the problem seems not to be > >>>>> related to https://issues.apache.org/jira/browse/FLINK-2032 because > >>>>> accumulators are tested. There are not result files involved (as fas > as > >>>>> I can tell). > >>>>> > >>>>> > >>>>> > >>>>> -Matthias > >>>>> > >>>> > >>>> > >>> > >> > > > >
