Hi, It looks like we had another timeout on the daily build: https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/677/console
Deron On Thu, Dec 8, 2016 at 9:59 PM, Acs S <ac...@yahoo.com.invalid> wrote: > +1 On adding Jenkins Build machines on PR builds. > Couple of times I hit waiting PR builds due to queue. If that is not > common, we can wait. > -Arvind From: Deron Eriksson <deroneriks...@gmail.com> > To: dev@systemml.incubator.apache.org > Sent: Friday, December 9, 2016 7:34 AM > Subject: Re: test suite running slowly after disable cache/sparse commit? > > Hi Fred, > > The last two daily tests ran around ~2:56 hr, so if this number is stable, > it seems that the new tests potentially add about half an hour to the test > suite time. I would like if we could decrease the test suite time rather > than add significantly to it. In fact, personally I'd prefer if we could do > something like move the time-consuming algorithm-type tests out of the main > test suite and just run the algorithm tests daily (if this is technically > possible). That way, we could get the main test suite time to be sped up > significantly but still benefit from daily test coverage provided by the > algorithm tests. I like the idea of a short test suite time since that > makes it easier to get feedback and continue working on an issue that day. > If the tests take too long to run, it means that issues that could > potentially be solved in one day will get pushed out to another day. > > Increasing the number of simultaneous Jenkins jobs allowed could help with > queued-up builds, which would be nice. Currently Jenkins runs a max of two > simultaneous jobs. Jenkins currently handles: > 1) two daily builds (at noon and at midnight) > 2) on-demand builds (so a developer can commit some code on a branch and > then have jenkins build/test so that a developer's machine isn't tied up) > 3) pull request builds (the initial push with a PR will trigger this along > with any subsequent pushes to the branch referenced by the PR). > > Today there is not a queue, but I'm the only person to trigger a PR build > today. If more than two developers are submitting PRs that day, there will > be a queue. This queue has been manageable, but if the increase in test > suite time is a permanent thing, I'd recommend bumping the simultaneous > Jenkins jobs from two to four. > > Deron > > > > On Thu, Dec 8, 2016 at 4:49 PM, Frederick R Reiss <frre...@us.ibm.com> > wrote: > > > +dev list > > > > I personally don't mind letting the regression suite run overnight. The > > important thing is that we do not push changes that have not passed the > > full automated test suite. In the interest of efficiency, we shouldn't > even > > be reviewing most PRs until after they have passed the automated tests. > > > > Deron, are you seeing a backlog of not-yet-started builds queueing up on > > the PR build server? If the queue is getting long, we can add additional > > machines to the Jenkins cluster. > > > > Fred > > > > [image: Inactive hide details for Deron Eriksson---12/08/2016 11:06:52 > > AM---Hi Niketan,]Deron Eriksson---12/08/2016 11:06:52 AM---Hi Niketan, > > > > From: Deron Eriksson/San Francisco/IBM > > To: Niketan Pansare/Almaden/IBM@IBMUS > > Cc: Berthold Reinwald/Almaden/IBM@IBMUS, Frederick R > > Reiss/Almaden/IBM@IBMUS > > Date: 12/08/2016 11:06 AM > > Subject: Re: test suite running slowly after disable cache/sparse commit? > > ------------------------------ > > > > > > > > Hi Niketan, > > > > Perhaps Berthold or Fred could add a little guidance here in terms of > what > > is acceptable? Having the test suite go from 2:21 to 3:41 (one pull > request > > yesterday took 4:11 to complete - > > *https://sparktc.ibmcloud.com/jenkins/job/SystemML- > PullRequestBuilder/909/* > > <https://sparktc.ibmcloud.com/jenkins/job/SystemML- > PullRequestBuilder/909/>) > > is very serious to me. Even if the test suite runs at 3:00, this is a > > serious slowdown. It slows down our ability to validate pull requests and > > other code on jenkins. > > > > Deron > > > > > > ----- Original message ----- > > From: Niketan Pansare/Almaden/IBM > > To: Deron Eriksson/San Francisco/IBM@ibmus > > Cc: Berthold Reinwald/Almaden/IBM@ibmus, Frederick R > > Reiss/Almaden/IBM@ibmus > > Subject: Re: test suite running slowly after disable cache/sparse commit? > > Date: Thu, Dec 8, 2016 8:55 AM > > > > Hi Deron, > > > > The commit replicated application tests for disable sparse and disable > > caching. So, the test time should increase. We should increase the > duration > > or reduce the number of application tests we want to test with caching > and > > sparse disabled. > > > > Thanks > > > > Niketan > > > > On Dec 8, 2016, at 7:47 AM, Deron Eriksson <*de...@us.ibm.com* > > <de...@us.ibm.com>> wrote: > > > > Hi Niketan, > > > > I noticed the daily test yesterday timed out, probably because of a > > long-running test. > > > > Looking at the commits from the day before ( > > *https://github.com/apache/incubator-systemml/commits/master* > > <https://github.com/apache/incubator-systemml/commits/master>), I > > noticed that [SYSTEMML-769] [SYSTEMML-1140] Removed > -disable-caching and > > -disable-… ( > > *https://github.com/apache/incubator-systemml/commit/ > caaaec90b61e529e50021d89f9f108230fa307a8* > > <https://github.com/apache/incubator-systemml/commit/ > caaaec90b61e529e50021d89f9f108230fa307a8>) > > updated some of the tests. > > > > So I ran the tests on the previous commit ( > > *https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/227/* > > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/227/>) > > and the tests ran in 2hr 21min. > > > > I ran the tests on the 'disable caching...' commit ( > > *https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/228/* > > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/228/>) > > and the tests ran in 3hr 41min. > > > > One thing that is confusing to me is that the nightly test just > > completed successfully ( > > *https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/674/* > > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/674/>) > > in 2hr 57min and did not time out like yesterday afternoon. So it > is always > > possible it could be a server issue. > > > > Could you look into this and see if that commit introduced an issue > > with the tests? > > > > Thanks! > > Deron > > > > > > > > > > > > > > > -- > Deron Eriksson > Spark Technology Center > http://www.spark.tc/ > > > -- Deron Eriksson Spark Technology Center http://www.spark.tc/