cool.  FYI, i'm at databricks today and talked w/patrick, josh and davies
about this.  we have some great ideas to actually make this happen and will
be pushing over the next few weeks to get it done.  :)

On Thu, Apr 2, 2015 at 9:21 AM, Nicholas Chammas <nicholas.cham...@gmail.com
> wrote:

> (Renaming thread so as to un-hijack Marcelo's request.)
>
> Sure, we definitely want tests running faster.
>
> Part of "testing all the things" will be factoring out stuff from the
> various builds that can be run just once.
>
> We've also tried in the past (with little success) to parallelize test
> execution <https://issues.apache.org/jira/browse/SPARK-3431>. That still
> needs work before it becomes possible.
>
> Nick
>
>
> On Thu, Apr 2, 2015 at 11:59 AM shane knapp <skn...@berkeley.edu> wrote:
>
>> i agree with all of this.  but can we please break up the tests and make
>> them shorter?  :)
>>
>> On Thu, Apr 2, 2015 at 8:54 AM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> This is secondary to Marcelo’s question, but I wanted to comment on this:
>>>
>>> Its main limitation is more cultural than technical: you need to get
>>> people
>>> to care about intermittent test runs, otherwise you can end up with
>>> failures that nobody keeps on top of
>>>
>>> This is a problem that plagues Spark as well, but there *is* a technical
>>> solution.
>>>
>>> The solution is simple: *All* the builds that we care about run for
>>> *every*
>>> proposed change. If *any* build fails, the change doesn’t make it into
>>> the
>>
>>
>>> repository.
>>>
>>> Spark already has a pull request builder that tests and reports back on
>>> PRs. Committers don’t merge in PRs when this builder reports that it
>>> failed
>>> some tests. That’s a good thing.
>>>
>>> The problem is that there are several other builds that we run on a fixed
>>> interval, independent of the pull request builder. These builds test
>>> different configurations, dependency versions, and environments than what
>>> the PR builder covers. If one of those builds fails, it fails on its own
>>> little island, with no-one to hear it scream. The build failure is
>>> detached
>>> from the PR that caused it to fail.
>>>
>>> What should happen is that the whole matrix of stuff we care to test gets
>>> run for every PR. No PR goes in if any build we care about fails for that
>>> PR, and every build we care about runs for every commit of every PR.
>>>
>>> Really, this is just an extension of the basic idea of the PR builder. It
>>>
>> doesn’t make much sense to test stuff *after* it has been committed and
>>
>>
>>> potentially broken things. And it becomes exponentially more difficult to
>>> find and fix a problem the longer it has been festering in the repo. It’s
>>> best to keep such problems out in the first place.
>>>
>>> With some more work on our CI infrastructure, I think this can be done.
>>> Maybe even later this year.
>>>
>>> Nick
>>>
>>
>>> On Thu, Apr 2, 2015 at 6:02 AM Steve Loughran ste...@hortonworks.com
>>>
>> <http://mailto:ste...@hortonworks.com> wrote:
>>>
>>>
>>> > > On 2 Apr 2015, at 06:31, Patrick Wendell <pwend...@gmail.com> wrote:
>>> > >
>>> > > Hey Marcelo,
>>> > >
>>> > > Great question. Right now, some of the more active developers have an
>>> > > account that allows them to log into this cluster to inspect logs (we
>>> > > copy the logs from each run to a node on that cluster). The
>>> > > infrastructure is maintained by the AMPLab.
>>> > >
>>> > > I will put you in touch the someone there who can get you an account.
>>> > >
>>> > > This is a short term solution. The longer term solution is to have
>>> > > these scp'd regularly to an S3 bucket or somewhere people can get
>>> > > access to them, but that's not ready yet.
>>> > >
>>> > > - Patrick
>>> > >
>>> > >>
>>> >
>>> >
>>> > ASF Jenkins is always there to play with; committers/PMC members should
>>> > just need to file a BUILD JIRA to get access.
>>> >
>>> > Its main limitation is more cultural than technical: you need to get
>>> > people to care about intermittent test runs, otherwise you can end up
>>> with
>>> > failures that nobody keeps on top of
>>> > https://builds.apache.org/view/H-L/view/Hadoop/
>>> >
>>> > Someone really needs to own the "keep the builds working" problem -and
>>> > have the ability to somehow kick others into fixing things. The latter
>>> is
>>> > pretty hard cross-organisation
>>> >
>>> >
>>> > >> That would be really helpful to debug build failures. The scalatest
>>> > >> output isn't all that helpful.
>>> > >>
>>> >
>>> > Potentially an issue with the test runner, rather than the tests
>>> > themselves.
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> > For additional commands, e-mail: dev-h...@spark.apache.org
>>> >
>>> >  ​
>>>
>>

Reply via email to