i agree with all of this.  but can we please break up the tests and make
them shorter?  :)

On Thu, Apr 2, 2015 at 8:54 AM, Nicholas Chammas <nicholas.cham...@gmail.com
> wrote:

> This is secondary to Marcelo’s question, but I wanted to comment on this:
>
> Its main limitation is more cultural than technical: you need to get people
> to care about intermittent test runs, otherwise you can end up with
> failures that nobody keeps on top of
>
> This is a problem that plagues Spark as well, but there *is* a technical
> solution.
>
> The solution is simple: *All* the builds that we care about run for *every*
> proposed change. If *any* build fails, the change doesn’t make it into the
> repository.
>
> Spark already has a pull request builder that tests and reports back on
> PRs. Committers don’t merge in PRs when this builder reports that it failed
> some tests. That’s a good thing.
>
> The problem is that there are several other builds that we run on a fixed
> interval, independent of the pull request builder. These builds test
> different configurations, dependency versions, and environments than what
> the PR builder covers. If one of those builds fails, it fails on its own
> little island, with no-one to hear it scream. The build failure is detached
> from the PR that caused it to fail.
>
> What should happen is that the whole matrix of stuff we care to test gets
> run for every PR. No PR goes in if any build we care about fails for that
> PR, and every build we care about runs for every commit of every PR.
>
> Really, this is just an extension of the basic idea of the PR builder. It
> doesn’t make much sense to test stuff *after* it has been committed and
> potentially broken things. And it becomes exponentially more difficult to
> find and fix a problem the longer it has been festering in the repo. It’s
> best to keep such problems out in the first place.
>
> With some more work on our CI infrastructure, I think this can be done.
> Maybe even later this year.
>
> Nick
>
> On Thu, Apr 2, 2015 at 6:02 AM Steve Loughran ste...@hortonworks.com
> <http://mailto:ste...@hortonworks.com> wrote:
>
>
> > > On 2 Apr 2015, at 06:31, Patrick Wendell <pwend...@gmail.com> wrote:
> > >
> > > Hey Marcelo,
> > >
> > > Great question. Right now, some of the more active developers have an
> > > account that allows them to log into this cluster to inspect logs (we
> > > copy the logs from each run to a node on that cluster). The
> > > infrastructure is maintained by the AMPLab.
> > >
> > > I will put you in touch the someone there who can get you an account.
> > >
> > > This is a short term solution. The longer term solution is to have
> > > these scp'd regularly to an S3 bucket or somewhere people can get
> > > access to them, but that's not ready yet.
> > >
> > > - Patrick
> > >
> > >>
> >
> >
> > ASF Jenkins is always there to play with; committers/PMC members should
> > just need to file a BUILD JIRA to get access.
> >
> > Its main limitation is more cultural than technical: you need to get
> > people to care about intermittent test runs, otherwise you can end up
> with
> > failures that nobody keeps on top of
> > https://builds.apache.org/view/H-L/view/Hadoop/
> >
> > Someone really needs to own the "keep the builds working" problem -and
> > have the ability to somehow kick others into fixing things. The latter is
> > pretty hard cross-organisation
> >
> >
> > >> That would be really helpful to debug build failures. The scalatest
> > >> output isn't all that helpful.
> > >>
> >
> > Potentially an issue with the test runner, rather than the tests
> > themselves.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
> >  ​
>

Reply via email to