Hi Niketan, That is a great point to bring up. Your statistics WRT the run times are very useful. I think Matthias mentioned potentially testing up to 80GB in another email thread last week. Perhaps that would be a good compromise between data size and test time?
Deron On Mon, May 23, 2016 at 11:34 AM, Niketan Pansare <[email protected]> wrote: > +1 for formalizing the release candidate process. Please note: the point 9 > and 10 (i.e. performance suite) on 6-node cluster including XS (0.08 GB) to > XL (800 GB) datasets takes 12-15 days. This estimate only includes > following algorithms: l2svm, GLM binomial probit, linregcg, linregds, > multilogreg, msvm, naive bayes and kmeans. This does not include time to > re-execute failed cases (if any) and sparse experiments. So, if we include > point 9 and 10 in our release process, we need to be aware that it would > take additional two weeks. > > Here are some statistics that could help us create smaller performance > suite: > 1. 96% of time is spent in XL cases > 2. 48% of time is spend in 3 cases (MultiLogReg XL cp+mr, cp+spark and > spark) > 3. 75% of time is spend in 9 cases (MultiLogReg/MSVM/Kmeans XL cp+mr, > cp+spark and spark) > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > [image: Inactive hide details for Frederick R Reiss---05/23/2016 09:57:36 > AM---+1 here too. A documented release process is a very good]Frederick R > Reiss---05/23/2016 09:57:36 AM---+1 here too. A documented release process > is a very good idea. Having a written checklist will make > > From: Frederick R Reiss/Almaden/IBM@IBMUS > To: [email protected] > Date: 05/23/2016 09:57 AM > Subject: Re: Formalize a release candidate review process? > ------------------------------ > > > > +1 here too. A documented release process is a very good idea. Having a > written checklist will make it easier to delegate these tasks to volunteers > who want to help out with the project. It will also build confidence among > potential users, since we can point to exactly what testing has been done > on each release. And any vendors who are thinking of bundling SystemML with > their products will want this documentation to support their own release > processes. > > Fred > > Luciano Resende ---05/21/2016 10:38:05 AM---+1, we should create a web > page, about producing a release, where one section would be how to produc > > From: Luciano Resende <[email protected]> > To: "[email protected]" <[email protected] > > > Date: 05/21/2016 10:38 AM > Subject: Re: Formalize a release candidate review process? > ------------------------------ > > > > +1, we should create a web page, about producing a release, where one > section would be how to produce a release candidate, and another session > would be thse items below with a bit more info on how ro execute them... > And then people could claim these or respond to the vote with the things > they have tested. > > Btw, for the build ones, we should recommend building with an empty maven > repo. > > On Saturday, May 21, 2016, Deron Eriksson <[email protected]> wrote: > > > Hi, > > > > It might be nice to formalize what needs to be done when reviewing a > > release candidate. I don't mean this as something that would add > > bureaucracy that would slow us down. Rather, it would be nice to have > > something as simple as a basic checklist of items that we could volunteer > > to check. That way, we could avoid potentially duplicating effort, which > > would speed us up, and we could avoid potentially missing some critical > > checks, which would help validate the integrity of our releases. > > > > Some potential items to check: > > 1) Entire test suite should pass on OS X, Windows, and Linux. > > 2) All artifacts and accompanying checksums are present (see > > > > > *https://dist.apache.org/repos/dist/dev/incubator/systemml/0.10.0-incubating-rc1/* > <https://dist.apache.org/repos/dist/dev/incubator/systemml/0.10.0-incubating-rc1/> > > ) > > 3) All artifacts containing SystemML classes can execute a 'hello world' > > example > > 4) LICENSE and NOTICE files for all the artifacts have been checked > > 5) SystemML runs algorithms locally in standalone single-node > > 5) SystemML runs algorithms on local Hadoop (hadoop jar ...) > > 6) SystemML runs algorithms on local Spark (spark-submit ...) > > 7) SystemML runs algorithms on a Hadoop cluster > > 8) SystemML runs algorithms on a Spark cluster > > 9) SystemML performance suite has been run on a Hadoop cluster > > 10) SystemML performance suite has been run on a Spark cluster > > > > Would this be too many things to check or too few? Are there any critical > > items missing? > > > > Deron > > > > > -- > Sent from my Mobile device > > > > > >
