[ https://issues.apache.org/jira/browse/SYSTEMML-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953912#comment-15953912 ]
Frederick Reiss commented on SYSTEMML-1451: ------------------------------------------- With regard to the existing automation of data generation: While there are already scripts to run the data generation, training, and test phases of the performance tests, I think some additional automation will be necessary before we can reliably use these tests for automated regression tests. Important things to add: * Create a single entry point for running any subset of the tests at any subset of the available scale factors with a single command. The current solution requires commenting and uncommenting lines in bash scripts. * Allow the user to provide a configuration file that specifies what combinations of data and algorithm parameters to run. The current solution requires deciphering undocumented and poorly-documented script arguments, then editing, commenting, and uncommenting lines in bash scripts. * Detect and reuse previously generated data if present. This step is currently entirely manual. * Integrate a small-scale version of the performance tests into the primary Maven build. This would help ensure that code pushed to git doesn't break an algorithm script. We currently do not test for algorithm breakage on a regular basis. * Create a driver script that will check out and build the latest SystemML, then run performance tests. We will need some variant of this script if we are to automate performance regression testing via Jenkins. > Automate performance testing and reporting > ------------------------------------------ > > Key: SYSTEMML-1451 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1451 > Project: SystemML > Issue Type: Improvement > Components: Infrastructure, Test > Reporter: Nakul Jindal > Labels: gsoc2017, mentor, performance, reporting, testing > > As part of a release (and in general), performance tests are run for SystemML. > Currently, running and reporting on these performance tests are a manual > process. There are helper scripts, but largely the process is manual. > The aim of this GSoC 2017 project is to automate performance testing and its > reporting. > These are the tasks that this entails > 1. Automate running of the performance tests, including generation of test > data > 2. Detect errors and report if any > 3. Record performance benchmarking information > 4. Automatically compare this performance to previous version to check for > performance regressions > 5. Automatically compare to Spark MLLib, R?, Julia? > 6. Prepare report with all the information about failed jobs, performance > information, perf info against other comparable projects/algorithms > (plotted/in plain text in CSV, PDF or other common format) > 7. Create scripts to automatically run this process on a cloud provider that > spins up machines, runs the test, saves the reports and spins down the > machines. > 8. Create a web application to do this interactively without dropping down > into a shell. > As part of this project, the student will need to know scripting (in Bash, > Python, etc). It may also involve changing error reporting and performance > reporting code in SystemML. > Rating - Medium (for the amount of work) > Mentor - [~nakul02] (Other co-mentors will join in) -- This message was sent by Atlassian JIRA (v6.3.15#6346)