as Deron mentioned, running all experiments up to 80GB is a good compromise. Over the weekend, I ran exactly that on Spark 1.6.1 and it took less than a day. This approach would allow us to run MR and different Spark versions instead.
Regarding the original mail, I think we can deduplicate the list a bit - running the performance tests over all algorithms obviously covers the checks of our algorithms as well. Other than that, +1 for creating these documents for the release process. Regards, Matthias From: Luciano Resende <[email protected]> To: [email protected] Date: 05/23/2016 12:15 PM Subject: Re: Formalize a release candidate review process? On Mon, May 23, 2016 at 11:34 AM, Niketan Pansare <[email protected]> wrote: > +1 for formalizing the release candidate process. Please note: the point 9 > and 10 (i.e. performance suite) on 6-node cluster including XS (0.08 GB) to > XL (800 GB) datasets takes 12-15 days. This estimate only includes > following algorithms: l2svm, GLM binomial probit, linregcg, linregds, > multilogreg, msvm, naive bayes and kmeans. This does not include time to > re-execute failed cases (if any) and sparse experiments. So, if we include > point 9 and 10 in our release process, we need to be aware that it would > take additional two weeks. > > An alternative would be to run this code in trunk, when we start preparing for the release. Another approach would be to release it, and provide a minor release if there are performance fixes that needs to go on top of the release. -- Luciano Resende http://twitter.com/lresende1975 http://lresende.blogspot.com/
