Re: Proposal to add 'accuracy test suite' before 1.0 release

dusenberrymw Fri, 17 Feb 2017 17:55:38 -0800

There is also the possibility of writing the correctness tests completely in 
DML itself, thus allowing an ML researcher / data scientist to easily create 
the tests. For example, the SystemML-NN library has a full test suite written 
entirely in DML in the `nn/test/` directory (i.e. no Java tests) that tests 
mathematical correctness of gradients, as well as general correctness of 
various layers as needed.


--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Feb 17, 2017, at 5:46 PM, Deron Eriksson <deroneriks...@gmail.com> wrote:
> 
> +1 for creating tests for the main algorithm scripts. This would be a great
> addition to the project.
> 
> Note that the creation of tests (junit) typically requires some Java skills
> (and knowledge of ml algorithms) whereas a new algorithm script typically
> requires R/Python skills. Therefore, testing of algorithms probably
> requires some focused coordination between 'data scientists' and
> 'developers' to occur for this to happen smoothly for new algorithms.
> 
> Deron
> 
> 
>> On Fri, Feb 17, 2017 at 5:28 PM, <dusenberr...@gmail.com> wrote:
>> 
>> +1 for testing our actual (vs simplified test version) scripts against
>> some metric of choice.  This will allow us to (1) ensure that each script
>> does not have a showstopper bug (engine bug), and (2) that this script is
>> still producing a reasonable mathematical result (math bug).
>> 
>> -Mike
>> 
>> --
>> 
>> Mike Dusenberry
>> GitHub: github.com/dusenberrymw
>> LinkedIn: linkedin.com/in/mikedusenberry
>> 
>> Sent from my iPhone.
>> 
>> 
>>> On Feb 17, 2017, at 4:17 PM, Niketan Pansare <npan...@us.ibm.com> wrote:
>>> 
>>> For now, I have updated our python mllearn tests to compare the
>> prediction of our algorithm to that of scikit-learn:
>> https://github.com/apache/incubator-systemml/blob/
>> master/src/main/python/tests/test_mllearn_numpy.py#L81
>>> 
>>> The test now uses scikit-learn predictions as the baseline and computes
>> the scores (accuracy score for classifiers and r2 score for regressors). If
>> the score is greater than 95%, the test pass. Though using this approach,
>> we do not measure the generalization capability of our algorithm, we at
>> least ensure that our algorithm performs no worse than scikit-learn under
>> default setting. We can make the testing even more rigorous later. The next
>> step would be to enable these python tests through jenkins.
>>> 
>>> Thanks,
>>> 
>>> Niketan Pansare
>>> IBM Almaden Research Center
>>> E-mail: npansar At us.ibm.com
>>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>>> 
>>> Matthias Boehm ---02/17/2017 11:54:02 AM---Yes, this has been discussed
>> a couple of times now, most recently in SYSTEMML-546. It takes quite s
>>> 
>>> From: Matthias Boehm <mboe...@googlemail.com>
>>> To: dev@systemml.incubator.apache.org
>>> Date: 02/17/2017 11:54 AM
>>> Subject: Re: Proposal to add 'accuracy test suite' before 1.0 release
>>> 
>>> 
>>> 
>>> 
>>> Yes, this has been discussed a couple of times now, most recently in
>>> SYSTEMML-546. It takes quite some effort though to create a
>>> sophisticated algorithm-level test suite as done for GLM. So by all
>>> means, please, go ahead and add these tests.
>>> 
>>> However, I would not impose any constraints on the contribution of new
>>> algorithms in that regard, or similarly on tests with simplified
>>> algorithms because it would raise the bar to high.
>>> 
>>> Regards,
>>> Matthias
>>> 
>>> 
>>>> On 2/17/2017 10:48 AM, Niketan Pansare wrote:
>>>> 
>>>> 
>>>> Hi all,
>>>> 
>>>> We currently test the correctness of individual runtime operators
>> using our
>>>> integration tests but not the "released" algorithms. To be fair, we do
>> test
>>>> a subset of "simplified" algorithms on synthetic datasets and compare
>> the
>>>> accuracy with R. Also, we are testing subset of released algorithms
>> using
>>>> our Python tests, but it's intended purpose is to only test the
>> integration
>>>> of the APIs:
>>>> Simplified algorithms:
>>>> https://github.com/apache/incubator-systemml/tree/
>> master/src/test/scripts/applications
>>>> Released algorithms:
>>>> https://github.com/apache/incubator-systemml/tree/
>> master/scripts/algorithms
>>>> Python tests:
>>>> https://github.com/apache/incubator-systemml/tree/
>> master/src/main/python/tests
>>>> 
>>>> Though the released algorithm is tested when it is initially
>> introduced,
>>>> other artifacts (spark versions, API changes, engine improvements, etc)
>>>> could cause them to return incorrect results over a period of time.
>>>> Therefore, similar to our performance test suite (
>>>> https://github.com/apache/incubator-systemml/tree/
>> master/scripts/perftest),
>>>> I propose we create another test suite ("accuracy test suite" for lack
>> of a
>>>> better term) that compares the accuracy (or some other metric) of our
>>>> released algorithms on standard datasets. Making it a requirement to
>> add
>>>> tests to accuracy test suite when adding the new algorithm will greatly
>>>> improve the production-readiness of SystemML as well as serve as a
>> usage
>>>> guide too. This implies we run both the performance as well as accuracy
>>>> test suite before our release. Alternative is to replace simplified
>>>> algorithms with our released algorithms.
>>>> 
>>>> Advantages of accuracy test suite approach:
>>>> 1. No increase the running time of integration tests on Jenkins.
>>>> 2. Accuracy test suite could use much larger datasets.
>>>> 3. Accuracy test suite could include algorithms that take longer to
>>>> converge (for example: Deep Learning algorithms).
>>>> 
>>>> Advantage of replacing simplified algorithms:
>>>> 1. No commit breaks any of the existing algorithms.
>>>> 
>>>> Thanks,
>>>> 
>>>> Niketan Pansare
>>>> IBM Almaden Research Center
>>>> E-mail: npansar At us.ibm.com
>>>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Deron Eriksson
> Spark Technology Center
> http://www.spark.tc/

Re: Proposal to add 'accuracy test suite' before 1.0 release

Reply via email to