Yeah the "List<MatchResult.Metadata> expected" is constructed
from Files.getLastModifiedTime() calls before the files are actually
modified, the code is basically unconditionally broken rather than merely
flaky.

There's several easy options:
1) Use PAssert.that().satisfies() instead of .contains(), and use
assertThat().contains() inside that, with the list constructed at time the
assertion is applied rather than declared.
2) Implement a Matcher<Metadata> that ignores last modified time and use
that

Jeff - your option #3 is unfortunately also race-prone, because the code
may match the files after they have been written but before
setLastModifiedTime was called.

On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <j...@klukas.net> wrote:

> Another option:
>
> #3 Have the writer thread call Files.setLastModifiedTime explicitly after
> each File.write. Then the lastModifiedMillis can be a stable value for each
> file and we can use those same static values in our expected result. I
> think that would also eliminate the race condition.
>
> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <ajam...@google.com> wrote:
>
>> Thanks Udi, is there a good example for either of these?
>> #1 - seems like you have to rewrite your assertion logic without the
>> PAssert? Is there some way to capture the pipeline output and iterate over
>> it? The pattern I have seen for this in the past also has thread safety
>> issues (Using a DoFn at the end of the pipeline to add the output to a
>> collection is not safe since the collection can be executed concurrently)
>> #2 - Would BigqueryMatcher be a good example for this? which is used in
>> BigQueryTornadoesIT.java Or is there another example you would suggest
>> looking at for reference?
>>
>>    - I guess to this you need to implement the SerializableMatcher
>>    interface and use the matcher as an option in the pipeline options.
>>
>>
>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> Some options:
>>> - You could wait to assert until after p.waitForFinish().
>>> - You could PAssert using SerializableMatcher and allow any
>>> lastModifiedTime.
>>>
>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <ajam...@google.com> wrote:
>>>
>>>> +Jeff, Eugene,
>>>>
>>>> Hi Jeff and Eugene,
>>>>
>>>> I've noticed that Jeff's PR
>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884>
>>>>  introduced
>>>> a race condition in this test, but its not clear exactly how to add Jeff's
>>>> test check in a thread safe way. I believe this to be the source of the
>>>> flakeyness Do you have any suggestions Eugene (since you authored this
>>>> test)?
>>>>
>>>> I added some details to this JIRA issue explaining in full
>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>
>>>>
>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <ajam...@google.com> wrote:
>>>>
>>>>> I've seen this fail in a few different PRs for different contributors,
>>>>> and its causing some issues during the presubmit process.. This is a
>>>>> multithreadred test with a lot of sleeps, so it looks a bit suspicious as
>>>>> the source of the problem.
>>>>>
>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>>>
>>>>> I filed a JIRA for this issue:
>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>
>>>>>
>>>>>

Reply via email to