I had originally suggested to use some Linux kernel tooling such as
inotifywait[1] to watch what is happening.

It is likely that we have some Gradle task which is running something in
parallel to a different Gradle task when it shouldn't which means that the
jar file is being changed/corrupted. I believe fixing our Gradle task
dependency tree wrt to this would solve the problem. This crash does not
reproduce on my desktop after 20 runs which makes it hard for me to test
for.

1: https://www.linuxjournal.com/content/linux-filesystem-events-inotify

On Mon, Sep 10, 2018 at 1:15 PM Ryan Williams <[email protected]> wrote:

> this continues to be an issue locally (cf. some discussion in #beam slack)
>
> commands like `./gradlew javaPreCommit` or `./gradlew build` reliably fail
> with a range of different
> <https://gist.github.com/ryan-williams/d9d3f7bd5f67c7c715e68ae4107aa4a0#file-1-javaprecommit-fails-L881-L903>
>  JVM
> crashes
> <https://gist.github.com/ryan-williams/d9d3f7bd5f67c7c715e68ae4107aa4a0#file-3-compile-failures-L328-L329>
> in a few different tasks, with messages that suggest filing a bug against
> the Java compiler
>
> what do we know about the actual race condition that is allowing one task
> to attempt to read from a JAR that is being overwritten by another task?
> presumably this is just a bug in our Gradle configs?
>
> On Mon, Aug 27, 2018 at 2:28 PM Andrew Pilloud <[email protected]>
> wrote:
>
>> It appears that there is no one working on a fix for the flakes, so I've
>> merged the change to disable parallel tasks on precommit.
>>
>> Andrew
>>
>> On Fri, Aug 24, 2018 at 1:30 PM Andrew Pilloud <[email protected]>
>> wrote:
>>
>>> I'm seeing failures due to this on 12 of the last 16 PostCommits.
>>> Precommits take about 22 minutes run in parallel, so at a 25% pass rate
>>> that puts the expected time to a good test run at 264 minutes assuming you
>>> immediately restart on each failure. We are looking at 56 minutes for a
>>> precommit that isn't run in parallel:
>>> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/266/ I'd
>>> rather have tests take a little longer then have to monitor them for
>>> several hours.
>>>
>>> I've opened a PR: https://github.com/apache/beam/pull/6274
>>>
>>> Andrew
>>>
>>> On Fri, Aug 24, 2018 at 10:47 AM Lukasz Cwik <[email protected]> wrote:
>>>
>>>> I believe it would mitigate the issue but also make the jobs take much
>>>> longer to complete.
>>>>
>>>> On Thu, Aug 23, 2018 at 2:44 PM Andrew Pilloud <[email protected]>
>>>> wrote:
>>>>
>>>>> There seems to be a misconfiguration of gradle that is causing a high
>>>>> rate of failure for the last several weeks in building beam-examples-java
>>>>> and beam-runners-apex. It appears to be some sort of race condition in
>>>>> building dependencies. Given that no one has made progress on fixing the
>>>>> root cause, is this something we could mitigate by running jobs with
>>>>> `--no-parallel` flag?
>>>>>
>>>>> https://issues.apache.org/jira/browse/BEAM-5035
>>>>> https://issues.apache.org/jira/browse/BEAM-5207
>>>>>
>>>>> Andrew
>>>>>
>>>>

Reply via email to