I'll work on getting a PR together this morning, probably following Eugene's suggestion #1.
On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri <eh...@google.com> wrote: > Alex, the only way to implement my suggestion #1 (that I know of) would be > to write to a file and read it back. > I don't have good example for #2. > > Eugene's suggestion no. 1 seems like a good idea. There are some example > <https://github.com/apache/beam/blob/324a1bcc820945731ccce7dd7e5354247b841356/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L335-L340> > in the codebase. > > On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov <kirpic...@google.com> > wrote: > >> Yeah the "List<MatchResult.Metadata> expected" is constructed >> from Files.getLastModifiedTime() calls before the files are actually >> modified, the code is basically unconditionally broken rather than merely >> flaky. >> >> There's several easy options: >> 1) Use PAssert.that().satisfies() instead of .contains(), and use >> assertThat().contains() inside that, with the list constructed at time the >> assertion is applied rather than declared. >> 2) Implement a Matcher<Metadata> that ignores last modified time and use >> that >> >> Jeff - your option #3 is unfortunately also race-prone, because the code >> may match the files after they have been written but before >> setLastModifiedTime was called. >> >> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <j...@klukas.net> wrote: >> >>> Another option: >>> >>> #3 Have the writer thread call Files.setLastModifiedTime explicitly >>> after each File.write. Then the lastModifiedMillis can be a stable value >>> for each file and we can use those same static values in our expected >>> result. I think that would also eliminate the race condition. >>> >>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <ajam...@google.com> wrote: >>> >>>> Thanks Udi, is there a good example for either of these? >>>> #1 - seems like you have to rewrite your assertion logic without the >>>> PAssert? Is there some way to capture the pipeline output and iterate over >>>> it? The pattern I have seen for this in the past also has thread safety >>>> issues (Using a DoFn at the end of the pipeline to add the output to a >>>> collection is not safe since the collection can be executed concurrently) >>>> #2 - Would BigqueryMatcher be a good example for this? which is used in >>>> BigQueryTornadoesIT.java Or is there another example you would suggest >>>> looking at for reference? >>>> >>>> - I guess to this you need to implement the SerializableMatcher >>>> interface and use the matcher as an option in the pipeline options. >>>> >>>> >>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <eh...@google.com> wrote: >>>> >>>>> Some options: >>>>> - You could wait to assert until after p.waitForFinish(). >>>>> - You could PAssert using SerializableMatcher and allow any >>>>> lastModifiedTime. >>>>> >>>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <ajam...@google.com> wrote: >>>>> >>>>>> +Jeff, Eugene, >>>>>> >>>>>> Hi Jeff and Eugene, >>>>>> >>>>>> I've noticed that Jeff's PR >>>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884> >>>>>> introduced >>>>>> a race condition in this test, but its not clear exactly how to add >>>>>> Jeff's >>>>>> test check in a thread safe way. I believe this to be the source of the >>>>>> flakeyness Do you have any suggestions Eugene (since you authored this >>>>>> test)? >>>>>> >>>>>> I added some details to this JIRA issue explaining in full >>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2 >>>>>> >>>>>> >>>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <ajam...@google.com> >>>>>> wrote: >>>>>> >>>>>>> I've seen this fail in a few different PRs for different >>>>>>> contributors, and its causing some issues during the presubmit process.. >>>>>>> This is a multithreadred test with a lot of sleeps, so it looks a bit >>>>>>> suspicious as the source of the problem. >>>>>>> >>>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/ >>>>>>> >>>>>>> I filed a JIRA for this issue: >>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2 >>>>>>> >>>>>>> >>>>>>>