Do we have inotifywait available on Travis and could set it up to log
concurrent access to the relevant Jar files?
On 10.09.18 22:41, Lukasz Cwik wrote:
I had originally suggested to use some Linux kernel tooling such as
inotifywait[1] to watch what is happening.
It is likely that we have some Gradle task which is running something in
parallel to a different Gradle task when it shouldn't which means that
the jar file is being changed/corrupted. I believe fixing our Gradle
task dependency tree wrt to this would solve the problem. This crash
does not reproduce on my desktop after 20 runs which makes it hard for
me to test for.
1: https://www.linuxjournal.com/content/linux-filesystem-events-inotify
On Mon, Sep 10, 2018 at 1:15 PM Ryan Williams <[email protected]
<mailto:[email protected]>> wrote:
this continues to be an issue locally (cf. some discussion in #beam
slack)
commands like `./gradlew javaPreCommit` or `./gradlew build`
reliably fail with a range of different
<https://gist.github.com/ryan-williams/d9d3f7bd5f67c7c715e68ae4107aa4a0#file-1-javaprecommit-fails-L881-L903>
JVM crashes
<https://gist.github.com/ryan-williams/d9d3f7bd5f67c7c715e68ae4107aa4a0#file-3-compile-failures-L328-L329>
in a few different tasks, with messages that suggest filing a bug
against the Java compiler
what do we know about the actual race condition that is allowing one
task to attempt to read from a JAR that is being overwritten by
another task? presumably this is just a bug in our Gradle configs?
On Mon, Aug 27, 2018 at 2:28 PM Andrew Pilloud <[email protected]
<mailto:[email protected]>> wrote:
It appears that there is no one working on a fix for the flakes,
so I've merged the change to disable parallel tasks on precommit.
Andrew
On Fri, Aug 24, 2018 at 1:30 PM Andrew Pilloud
<[email protected] <mailto:[email protected]>> wrote:
I'm seeing failures due to this on 12 of the last 16
PostCommits. Precommits take about 22 minutes run in
parallel, so at a 25% pass rate that puts the expected time
to a good test run at 264 minutes assuming you immediately
restart on each failure. We are looking at 56 minutes for a
precommit that isn't run in parallel:
https://builds.apache.org/job/beam_PreCommit_Java_Phrase/266/ I'd
rather have tests take a little longer then have to monitor
them for several hours.
I've opened a PR: https://github.com/apache/beam/pull/6274
Andrew
On Fri, Aug 24, 2018 at 10:47 AM Lukasz Cwik
<[email protected] <mailto:[email protected]>> wrote:
I believe it would mitigate the issue but also make the
jobs take much longer to complete.
On Thu, Aug 23, 2018 at 2:44 PM Andrew Pilloud
<[email protected] <mailto:[email protected]>> wrote:
There seems to be a misconfiguration of gradle that
is causing a high rate of failure for the last
several weeks in building beam-examples-java and
beam-runners-apex. It appears to be some sort of
race condition in building dependencies. Given that
no one has made progress on fixing the root cause,
is this something we could mitigate by running jobs
with `--no-parallel` flag?
https://issues.apache.org/jira/browse/BEAM-5035
https://issues.apache.org/jira/browse/BEAM-5207
Andrew