There is also the possibility of writing the correctness tests completely in DML itself, thus allowing an ML researcher / data scientist to easily create the tests. For example, the SystemML-NN library has a full test suite written entirely in DML in the `nn/test/` directory (i.e. no Java tests) that tests mathematical correctness of gradients, as well as general correctness of various layers as needed.
-- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Feb 17, 2017, at 5:46 PM, Deron Eriksson <deroneriks...@gmail.com> wrote: > > +1 for creating tests for the main algorithm scripts. This would be a great > addition to the project. > > Note that the creation of tests (junit) typically requires some Java skills > (and knowledge of ml algorithms) whereas a new algorithm script typically > requires R/Python skills. Therefore, testing of algorithms probably > requires some focused coordination between 'data scientists' and > 'developers' to occur for this to happen smoothly for new algorithms. > > Deron > > >> On Fri, Feb 17, 2017 at 5:28 PM, <dusenberr...@gmail.com> wrote: >> >> +1 for testing our actual (vs simplified test version) scripts against >> some metric of choice. This will allow us to (1) ensure that each script >> does not have a showstopper bug (engine bug), and (2) that this script is >> still producing a reasonable mathematical result (math bug). >> >> -Mike >> >> -- >> >> Mike Dusenberry >> GitHub: github.com/dusenberrymw >> LinkedIn: linkedin.com/in/mikedusenberry >> >> Sent from my iPhone. >> >> >>> On Feb 17, 2017, at 4:17 PM, Niketan Pansare <npan...@us.ibm.com> wrote: >>> >>> For now, I have updated our python mllearn tests to compare the >> prediction of our algorithm to that of scikit-learn: >> https://github.com/apache/incubator-systemml/blob/ >> master/src/main/python/tests/test_mllearn_numpy.py#L81 >>> >>> The test now uses scikit-learn predictions as the baseline and computes >> the scores (accuracy score for classifiers and r2 score for regressors). If >> the score is greater than 95%, the test pass. Though using this approach, >> we do not measure the generalization capability of our algorithm, we at >> least ensure that our algorithm performs no worse than scikit-learn under >> default setting. We can make the testing even more rigorous later. The next >> step would be to enable these python tests through jenkins. >>> >>> Thanks, >>> >>> Niketan Pansare >>> IBM Almaden Research Center >>> E-mail: npansar At us.ibm.com >>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar >>> >>> Matthias Boehm ---02/17/2017 11:54:02 AM---Yes, this has been discussed >> a couple of times now, most recently in SYSTEMML-546. It takes quite s >>> >>> From: Matthias Boehm <mboe...@googlemail.com> >>> To: dev@systemml.incubator.apache.org >>> Date: 02/17/2017 11:54 AM >>> Subject: Re: Proposal to add 'accuracy test suite' before 1.0 release >>> >>> >>> >>> >>> Yes, this has been discussed a couple of times now, most recently in >>> SYSTEMML-546. It takes quite some effort though to create a >>> sophisticated algorithm-level test suite as done for GLM. So by all >>> means, please, go ahead and add these tests. >>> >>> However, I would not impose any constraints on the contribution of new >>> algorithms in that regard, or similarly on tests with simplified >>> algorithms because it would raise the bar to high. >>> >>> Regards, >>> Matthias >>> >>> >>>> On 2/17/2017 10:48 AM, Niketan Pansare wrote: >>>> >>>> >>>> Hi all, >>>> >>>> We currently test the correctness of individual runtime operators >> using our >>>> integration tests but not the "released" algorithms. To be fair, we do >> test >>>> a subset of "simplified" algorithms on synthetic datasets and compare >> the >>>> accuracy with R. Also, we are testing subset of released algorithms >> using >>>> our Python tests, but it's intended purpose is to only test the >> integration >>>> of the APIs: >>>> Simplified algorithms: >>>> https://github.com/apache/incubator-systemml/tree/ >> master/src/test/scripts/applications >>>> Released algorithms: >>>> https://github.com/apache/incubator-systemml/tree/ >> master/scripts/algorithms >>>> Python tests: >>>> https://github.com/apache/incubator-systemml/tree/ >> master/src/main/python/tests >>>> >>>> Though the released algorithm is tested when it is initially >> introduced, >>>> other artifacts (spark versions, API changes, engine improvements, etc) >>>> could cause them to return incorrect results over a period of time. >>>> Therefore, similar to our performance test suite ( >>>> https://github.com/apache/incubator-systemml/tree/ >> master/scripts/perftest), >>>> I propose we create another test suite ("accuracy test suite" for lack >> of a >>>> better term) that compares the accuracy (or some other metric) of our >>>> released algorithms on standard datasets. Making it a requirement to >> add >>>> tests to accuracy test suite when adding the new algorithm will greatly >>>> improve the production-readiness of SystemML as well as serve as a >> usage >>>> guide too. This implies we run both the performance as well as accuracy >>>> test suite before our release. Alternative is to replace simplified >>>> algorithms with our released algorithms. >>>> >>>> Advantages of accuracy test suite approach: >>>> 1. No increase the running time of integration tests on Jenkins. >>>> 2. Accuracy test suite could use much larger datasets. >>>> 3. Accuracy test suite could include algorithms that take longer to >>>> converge (for example: Deep Learning algorithms). >>>> >>>> Advantage of replacing simplified algorithms: >>>> 1. No commit breaks any of the existing algorithms. >>>> >>>> Thanks, >>>> >>>> Niketan Pansare >>>> IBM Almaden Research Center >>>> E-mail: npansar At us.ibm.com >>>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar >>>> >>> >>> >>> >>> >> > > > > -- > Deron Eriksson > Spark Technology Center > http://www.spark.tc/