[SYSTEMML-1451][Phase3] phase 3 work - Offline CSV support - Family bug fix - Plots - Doc Update - Stats update - Bug train, predict append family name
Closes #604 Project: http://git-wip-us.apache.org/repos/asf/systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/fdc2be22 Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/fdc2be22 Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/fdc2be22 Branch: refs/heads/gh-pages Commit: fdc2be22b66151f798a6e2b5be439bd616d24494 Parents: 07bd40a Author: krishnakalyan3 <[email protected]> Authored: Sat Aug 26 11:52:59 2017 -0700 Committer: Nakul Jindal <[email protected]> Committed: Sat Aug 26 11:52:59 2017 -0700 ---------------------------------------------------------------------- python-performance-test.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/systemml/blob/fdc2be22/python-performance-test.md ---------------------------------------------------------------------- diff --git a/python-performance-test.md b/python-performance-test.md index ce36c2d..25e1f35 100644 --- a/python-performance-test.md +++ b/python-performance-test.md @@ -148,6 +148,17 @@ Run performance test for all algorithms under the family `regression2` and log w Run performance test for all algorithms using HDFS. +## Result Consolidation and Plotting +We have two scripts, `stats.py` forpulling results from google docs and `update.py` to updating results to google docs or local file system. + +Example of `update.py` would be below +`./scripts/perftest/python/google_docs/update.py --file ../../temp/perf_test_singlenode.out --exec-type singlenode --tag 2 --append test.csv` +The arguments being `--file` path of the perf-test output, `--exec-type` execution mode used to generate the perf-test output, `--tag` being the realease version or a unique name, `--append` being an optional argument that would append the a local csv file. If instead of `--append` the `--auth` argument needs the location of the `google api key` file. + +Example of `stats.py` below +` ./stats.py --auth ../key/client_json.json --exec-type singlenode --plot stats1_data-gen_none_dense_10k_100` +`--plot` argument needs the name of the composite key that you would like to compare results over. If this argument is not specified the results would be grouped by keys. + ## Operational Notes All performance test depend mainly on two scripts for execution `systemml-standalone.py` and `systemml-spark-submit.py`. Incase we need to change standalone or spark parameters we need to manually change these parameters in their respective scripts. @@ -158,7 +169,7 @@ The logs contain the following information below comma separated. algorithm | run_type | intercept | matrix_type | data_shape | time_sec --- | --- | --- | --- | --- | --- | -multinomial|data-gen|0|dense|10k_100| 0.33 +multinomial|data-gen|0|10k_100|dense| 0.33 MultiLogReg|train|0|10k_100|dense|6.956 MultiLogReg|predict|0|10k_100|dense|4.780 @@ -187,9 +198,12 @@ Matrix Shape | Approximate Data Size 10M_1k|80GB 100M_1k|800GB + For example the command below runs performance test for all data sizes described above `run_perftest.py --family binomial clustering multinomial regression1 regression2 stats1 stats2 --mat-shape 10k_1k 100k_1k 1M_1k 10M_1k 100M_1k --master yarn-client --temp-dir hdfs://localhost:9000/user/systemml` +By default data generated in `hybrid_spark` execution mode is in the current users `hdfs` home directory. + Note: Please use this command `pip3 install -r requirements.txt` before using the perftest scripts.
