A useful tool for investigating test flakiness is my Jenkins Test Explorer
service, running at https://spark-tests.appspot.com/

This has some useful timeline views for debugging flaky builds. For
instance, at
https://spark-tests.appspot.com/jobs/spark-master-test-maven-hadoop-2.6 (may
be slow to load) you can see this chart: https://i.imgur.com/j8LV3pX.png.
Here, each column represents a test run and each row represents a test
which failed at least once over the displayed time period.

In that linked example screenshot you'll notice that a few columns have
grey squares indicating that tests were skipped but lack any red squares to
indicate test failures. This usually indicates that the build failed due to
a problem other than an individual test failure. For example, I clicked
into one of those builds and found that one test suite failed in test setup
because the previous suite had not properly cleaned up its SparkContext
(I'll file a JIRA for this).

You can click through the interface to drill down to reports on individual
builds, tests, suites, etc. As an example of an individual test's detail
page,
https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.rdd.LocalCheckpointSuite&test_name=missing+checkpoint+block+fails+with+informative+message
shows
the patterns of flakiness in a streaming checkpoint test.

Finally, there's an experimental "interesting new test failures" report
which tries to surface tests which have started failing very recently:
https://spark-tests.appspot.com/failed-tests/new. Specifically, entries in
this feed are test failures which a) occurred in the last week, b) were not
part of a build which had 20 or more failed tests, and c) were not observed
to fail in during the previous week (i.e. no failures from [2 weeks ago, 1
week ago)), and d) which represent the first time that the test failed this
week (i.e. a test case will appear at most once in the results list). I've
also exposed this as an RSS feed at
https://spark-tests.appspot.com/rss/failed-tests/new.


On Wed, Feb 15, 2017 at 12:51 PM Saikat Kanjilal <sxk1...@hotmail.com>
wrote:

I would recommend we just open JIRA's for unit tests based on module
(core/ml/sql etc) and we fix this one module at a time, this at least keeps
the number of unit tests needing fixing down to a manageable number.


------------------------------
*From:* Armin Braun <m...@obrown.io>
*Sent:* Wednesday, February 15, 2017 12:48 PM
*To:* Saikat Kanjilal
*Cc:* Kay Ousterhout; dev@spark.apache.org
*Subject:* Re: File JIRAs for all flaky test failures

I think one thing that is contributing to this a lot too is the general
issue of the tests taking up a lot of file descriptors (10k+ if I run them
on a standard Debian machine).
There are a few suits that contribute to this in particular like
`org.apache.spark.ExecutorAllocationManagerSuite` which, like a few others,
appears to consume a lot of fds.

Wouldn't it make sense to open JIRAs about those and actively try to reduce
the resource consumption of these tests?
Seems to me these can cause a lot of unpredictable behavior (making the
reason for flaky tests hard to identify especially when there's timeouts
etc. involved) + they make it prohibitively expensive for many to test
locally imo.

On Wed, Feb 15, 2017 at 9:24 PM, Saikat Kanjilal <sxk1...@hotmail.com>
wrote:

I was working on something to address this a while ago
https://issues.apache.org/jira/browse/SPARK-9487 but the difficulty in
testing locally made things a lot more complicated to fix for each of the
unit tests, should we resurface this JIRA again, I would whole heartedly
agree with the flakiness assessment of the unit tests.
[SPARK-9487] Use the same num. worker threads in Scala ...
<https://issues.apache.org/jira/browse/SPARK-9487>
issues.apache.org
In Python we use `local[4]` for unit tests, while in Scala/Java we use
`local[2]` and `local` for some unit tests in SQL, MLLib, and other
components. If the ...



------------------------------
*From:* Kay Ousterhout <kayousterh...@gmail.com>
*Sent:* Wednesday, February 15, 2017 12:10 PM
*To:* dev@spark.apache.org
*Subject:* File JIRAs for all flaky test failures

Hi all,

I've noticed the Spark tests getting increasingly flaky -- it seems more
common than not now that the tests need to be re-run at least once on PRs
before they pass.  This is both annoying and problematic because it makes
it harder to tell when a PR is introducing new flakiness.

To try to clean this up, I'd propose filing a JIRA *every time* Jenkins
fails on a PR (for a reason unrelated to the PR).  Just provide a quick
description of the failure -- e.g., "Flaky test: DagSchedulerSuite" or
"Tests failed because 250m timeout expired", a link to the failed build,
and include the "Tests" component.  If there's already a JIRA for the
issue, just comment with a link to the latest failure.  I know folks don't
always have time to track down why a test failed, but this it at least
helpful to someone else who, later on, is trying to diagnose when the issue
started to find the problematic code / test.

If this seems like too high overhead, feel free to suggest alternative ways
to make the tests less flaky!

-Kay

Reply via email to