Yeah the "List<MatchResult.Metadata> expected" is constructed from Files.getLastModifiedTime() calls before the files are actually modified, the code is basically unconditionally broken rather than merely flaky.
There's several easy options: 1) Use PAssert.that().satisfies() instead of .contains(), and use assertThat().contains() inside that, with the list constructed at time the assertion is applied rather than declared. 2) Implement a Matcher<Metadata> that ignores last modified time and use that Jeff - your option #3 is unfortunately also race-prone, because the code may match the files after they have been written but before setLastModifiedTime was called. On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <j...@klukas.net> wrote: > Another option: > > #3 Have the writer thread call Files.setLastModifiedTime explicitly after > each File.write. Then the lastModifiedMillis can be a stable value for each > file and we can use those same static values in our expected result. I > think that would also eliminate the race condition. > > On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <ajam...@google.com> wrote: > >> Thanks Udi, is there a good example for either of these? >> #1 - seems like you have to rewrite your assertion logic without the >> PAssert? Is there some way to capture the pipeline output and iterate over >> it? The pattern I have seen for this in the past also has thread safety >> issues (Using a DoFn at the end of the pipeline to add the output to a >> collection is not safe since the collection can be executed concurrently) >> #2 - Would BigqueryMatcher be a good example for this? which is used in >> BigQueryTornadoesIT.java Or is there another example you would suggest >> looking at for reference? >> >> - I guess to this you need to implement the SerializableMatcher >> interface and use the matcher as an option in the pipeline options. >> >> >> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <eh...@google.com> wrote: >> >>> Some options: >>> - You could wait to assert until after p.waitForFinish(). >>> - You could PAssert using SerializableMatcher and allow any >>> lastModifiedTime. >>> >>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <ajam...@google.com> wrote: >>> >>>> +Jeff, Eugene, >>>> >>>> Hi Jeff and Eugene, >>>> >>>> I've noticed that Jeff's PR >>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884> >>>> introduced >>>> a race condition in this test, but its not clear exactly how to add Jeff's >>>> test check in a thread safe way. I believe this to be the source of the >>>> flakeyness Do you have any suggestions Eugene (since you authored this >>>> test)? >>>> >>>> I added some details to this JIRA issue explaining in full >>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2 >>>> >>>> >>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <ajam...@google.com> wrote: >>>> >>>>> I've seen this fail in a few different PRs for different contributors, >>>>> and its causing some issues during the presubmit process.. This is a >>>>> multithreadred test with a lot of sleeps, so it looks a bit suspicious as >>>>> the source of the problem. >>>>> >>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/ >>>>> >>>>> I filed a JIRA for this issue: >>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2 >>>>> >>>>> >>>>>