Repository: incubator-systemml Updated Branches: refs/heads/master b584aecf6 -> 1035699c3
[SYSTEMML-764] Add Univar-Stats.dml labeled console output Added console output. Removed the existing table listing number and name of univariate statistics to avoid redundancy. Closes #192. Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/1035699c Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/1035699c Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/1035699c Branch: refs/heads/master Commit: 1035699c3b23bef916eab4738a9d2d64e98d9d6e Parents: b584aec Author: Sandeep Narayanaswami <[email protected]> Authored: Tue Jul 19 15:36:28 2016 -0700 Committer: Glenn Weidner <[email protected]> Committed: Tue Jul 19 15:36:28 2016 -0700 ---------------------------------------------------------------------- docs/quick-start-guide.md | 103 ++++++++++++++++++++++++++--------------- 1 file changed, 66 insertions(+), 37 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1035699c/docs/quick-start-guide.md ---------------------------------------------------------------------- diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md index ed6bd3a..f05db25 100644 --- a/docs/quick-start-guide.md +++ b/docs/quick-start-guide.md @@ -145,25 +145,78 @@ for each feature column using the algorithm `Univar-Stats.dml` which requires 3 * `X`: location of the input data file to analyze * `TYPES`: location of the file that contains the feature column types encoded by integer numbers: `1` = scale, `2` = nominal, `3` = ordinal -* `STATS`: location of the output matrix of computed statistics will be stored +* `STATS`: location where the output matrix of computed statistics is to be stored We need to create a file `types.csv` that describes the type of each column in -the data along with it's metadata file `types.csv.mtd`. +the data along with its metadata file `types.csv.mtd`. $ echo '1,1,1,2' > data/types.csv $ echo '{"rows": 1, "cols": 4, "format": "csv"}' > data/types.csv.mtd -To run the `Univar-Stats.dml` algorithm, issue the following command: - - $ ./runStandaloneSystemML.sh scripts/algorithms/Univar-Stats.dml -nvargs X=data/haberman.data TYPES=data/types.csv STATS=data/univarOut.mtx - -The resulting matrix has one row per each univariate statistic and one column -per input feature. The output file `univarOut.mtx` describes that -matrix. The elements of the first column denote the number of the statistic, -the elements of the second column refer to the number of the feature column in -the input data, and the elements of the third column show the value of the -univariate statistic. +To run the `Univar-Stats.dml` algorithm, issue the following command (we set the optional argument `CONSOLE_OUTPUT` to `TRUE` to print the statistics to the console): + + $ ./runStandaloneSystemML.sh scripts/algorithms/Univar-Stats.dml -nvargs X=data/haberman.data TYPES=data/types.csv STATS=data/univarOut.mtx CONSOLE_OUTPUT=TRUE + + [...] + ------------------------------------------------- + Feature [1]: Scale + (01) Minimum | 30.0 + (02) Maximum | 83.0 + (03) Range | 53.0 + (04) Mean | 52.45751633986928 + (05) Variance | 116.71458266366658 + (06) Std deviation | 10.803452349303281 + (07) Std err of mean | 0.6175922641866753 + (08) Coeff of variation | 0.20594669940735139 + (09) Skewness | 0.1450718616532357 + (10) Kurtosis | -0.6150152487211726 + (11) Std err of skewness | 0.13934809593495995 + (12) Std err of kurtosis | 0.277810485320835 + (13) Median | 52.0 + (14) Interquartile mean | 52.16013071895425 + ------------------------------------------------- + Feature [2]: Scale + (01) Minimum | 58.0 + (02) Maximum | 69.0 + (03) Range | 11.0 + (04) Mean | 62.85294117647059 + (05) Variance | 10.558630665380907 + (06) Std deviation | 3.2494046632238507 + (07) Std err of mean | 0.18575610076612029 + (08) Coeff of variation | 0.051698529971741194 + (09) Skewness | 0.07798443581479181 + (10) Kurtosis | -1.1324380182967442 + (11) Std err of skewness | 0.13934809593495995 + (12) Std err of kurtosis | 0.277810485320835 + (13) Median | 63.0 + (14) Interquartile mean | 62.80392156862745 + ------------------------------------------------- + Feature [3]: Scale + (01) Minimum | 0.0 + (02) Maximum | 52.0 + (03) Range | 52.0 + (04) Mean | 4.026143790849673 + (05) Variance | 51.691117539912135 + (06) Std deviation | 7.189653506248555 + (07) Std err of mean | 0.41100513466216837 + (08) Coeff of variation | 1.7857418611299172 + (09) Skewness | 2.954633471088322 + (10) Kurtosis | 11.425776549251449 + (11) Std err of skewness | 0.13934809593495995 + (12) Std err of kurtosis | 0.277810485320835 + (13) Median | 1.0 + (14) Interquartile mean | 1.2483660130718954 + ------------------------------------------------- + Feature [4]: Categorical (Nominal) + (15) Num of categories | 2 + (16) Mode | 1 + (17) Num of modes | 1 + + +The `Univar-Stats.dml` script writes the computed statistics to the `univarOut.mtx` file. The matrix has one row per univariate statistic and one column per input feature. The first column gives the number of the statistic +(see above table), the second column gives the number of the feature column in +the input data, and the third column gives the value of the univariate statistic. 1 1 30.0 1 2 58.0 @@ -210,31 +263,6 @@ univariate statistic. 16 4 1.0 17 4 1.0 -The following table lists the number and name of each univariate statistic. The row -numbers below correspond to the elements of the first column in the output -matrix above. The signs "+" show applicability to scale or/and to categorical -features. - - | Row | Name of Statistic | Scale | Categ. | - | :-: |:-------------------------- |:-----:| :-----:| - | 1 | Minimum | + | | - | 2 | Maximum | + | | - | 3 | Range | + | | - | 4 | Mean | + | | - | 5 | Variance | + | | - | 6 | Standard deviation | + | | - | 7 | Standard error of mean | + | | - | 8 | Coefficient of variation | + | | - | 9 | Skewness | + | | - | 10 | Kurtosis | + | | - | 11 | Standard error of skewness | + | | - | 12 | Standard error of kurtosis | + | | - | 13 | Median | + | | - | 14 | Inter quartile mean | + | | - | 15 | Number of categories | | + | - | 16 | Mode | | + | - | 17 | Number of modes | | + | - <br/> <br/> @@ -368,3 +396,4 @@ the memory available to the JVM, i.e: <br/> +`this is code` \ No newline at end of file
