I've also seen the BufferSpillerTest fail: https://travis-ci.org/apache/flink/jobs/74057503
On Tue, 4 Aug 2015 at 14:10 Robert Metzger <rmetz...@apache.org> wrote: > I've assigned https://issues.apache.org/jira/browse/FLINK-1680 to myself. > Maybe Tachyon 0.7 will fix the issues. > > On Tue, Aug 4, 2015 at 1:57 PM, Stephan Ewen <se...@apache.org> wrote: > > > Yes. > > > > We should know, though, whether this is a Java 6 bug, or a bug in our > > system that just happens to occur only with Java 6 (because of different > > timings in this other engine) > > > > On Tue, Aug 4, 2015 at 12:27 PM, Chesnay Schepler < > > chesnay.schep...@fu-berlin.de> wrote: > > > > > Aren't we dropping java 6 support? > > > > > > > > > On 04.08.2015 12:21, Stephan Ewen wrote: > > > > > >> The "StateCheckpointedITCase" has not failed so far, which also test > > these > > >> guarantees thoroughly. > > >> > > >> But we need to first rule out the BarrierBuffer. The problem is that > the > > >> bug occur only on Java 6 and cannot be reproduced locally... > > >> > > >> On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra <gyula.f...@gmail.com> > > wrote: > > >> > > >> Honestly I don't think the partitioned state changes have anything to > do > > >>> with the stability, only the reworked test case, which now test > proper > > >>> exactly-once which was missing before. > > >>> > > >>> Stephan Ewen <se...@apache.org> ezt írta (időpont: 2015. aug. 4., K, > > >>> 12:12): > > >>> > > >>> Yes, the build stability is super serious right now. > > >>>> > > >>>> Here are the problems in question, and what we could do about this: > > >>>> > > >>>> > > >>>> > > >>>> BarrierBuffer: > > >>>> -------------------- > > >>>> Barrier Buffer tests fail in Java 6 builds. > > >>>> > > >>>> I have not found a way to diagnose that problem, yet, but if we > cannot > > >>>> > > >>> find > > >>> > > >>>> the issue today, I would be willing to revert my latest commits on > the > > >>>> barrier buffer to increase the stability. > > >>>> > > >>>> > > >>>> StreamCheckpointingITCase > > >>>> ------------------------------------------- > > >>>> This seems to have started with either the barrier buffer, or the > > >>>> updated > > >>>> partitioned state. If fixing/reverting the barrier buffer does not > fix > > >>>> > > >>> it, > > >>> > > >>>> and no fix has come up > > >>>> > > >>>> until then, let's revert the latest changes to the partitioned state > > and > > >>>> re-add them when they are stable. > > >>>> > > >>>> > > >>>> Tachyon: > > >>>> ------------- > > >>>> The Tachyon mini cluster has a problem, apparently, the programs > exit > > >>>> > > >>> with > > >>> > > >>>> a sysexit or segfault. > > >>>> > > >>>> Since we have no Tachyon code ourselves, do we need this test as > part > > of > > >>>> the nightly tests? > > >>>> Can we make this a "manual" test that we trigger on demand? > > >>>> > > >>>> > > >>>> > > >>>> Greetings, > > >>>> Stephan > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek < > > aljos...@apache.org> > > >>>> wrote: > > >>>> > > >>>> I've also seen this fail: > > >>>>> > > >>>> https://travis-ci.org/apache/flink/jobs/74025862 > > >>>> > > >>>>> in SuccessAfterNetworkBuffersFailureITCase > > >>>>> > > >>>>> Build seems quite flaky recently. > > >>>>> > > >>>>> On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax < > > >>>>> > > >>>> mj...@informatik.hu-berlin.de > > >>>> > > >>>>> wrote: > > >>>>> > > >>>>> Rebased on: > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>> > > > https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3 > > >>> > > >>>> But if the gap between two rebases is large, it's hard to say what > > >>>>>> > > >>>>> the > > >>> > > >>>> problem might be... > > >>>>>> > > >>>>>> The old parent commit (ie, rebase before last rebase) was > > >>>>>> > > >>>>>> > > >>>>>> > > >>> > > > https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e > > >>> > > >>>> -Matthias > > >>>>>> > > >>>>>> On 08/04/2015 08:57 AM, Aljoscha Krettek wrote: > > >>>>>> > > >>>>>>> What are the commits that you rebased on? Could you maybe narrow > > >>>>>>> > > >>>>>> down > > >>> > > >>>> what > > >>>>>> > > >>>>>>> caused the regression? > > >>>>>>> > > >>>>>>> On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax < > > >>>>>>> > > >>>>>> mj...@informatik.hu-berlin.de> > > >>>>>> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>> I only report failing tests after a rebase. ;) > > >>>>>>>> > > >>>>>>>> -Matthias > > >>>>>>>> > > >>>>>>>> On 08/03/2015 11:23 PM, Henry Saputra wrote: > > >>>>>>>> > > >>>>>>>>> Thanks for reporting it , Matthias. Will try to run Travis for > > >>>>>>>>> > > >>>>>>>> latest > > >>>> > > >>>>> Flink. > > >>>>>>>> > > >>>>>>>>> Tachyon test is a bit flaky. Maybe updating to latest release > > >>>>>>>>> > > >>>>>>>> could > > >>> > > >>>> help. > > >>>>>> > > >>>>>>> - Henry > > >>>>>>>>> > > >>>>>>>>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax > > >>>>>>>>> <mj...@informatik.hu-berlin.de> wrote: > > >>>>>>>>> > > >>>>>>>>>> Today, not a single built was successful completely. Please > see > > >>>>>>>>>> > > >>>>>>>>> here: > > >>>>> > > >>>>>> Flink Streaming Core: > > >>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73938109 > > >>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73951362 > > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938124 > > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73899795 > > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938122 > > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73952441 > > >>>>>>>>>> > > >>>>>>>>>> Flink Taychon: > > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938123 > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> -Matthias > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>> > > > > > >