Repository: incubator-systemml Updated Branches: refs/heads/master d69f3441c -> 0a89676fa
[SYSTEMML-1380] Kmeans isY and verb parameters should be boolean Change Kmeans.dml isY and verb parameters to boolean. Closes #516. Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/0a89676f Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/0a89676f Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/0a89676f Branch: refs/heads/master Commit: 0a89676fa67a891bdcc87bc526071e53a35a2d87 Parents: d69f344 Author: krishnakalyan3 <[email protected]> Authored: Thu May 25 23:00:52 2017 -0700 Committer: Deron Eriksson <[email protected]> Committed: Thu May 25 23:00:52 2017 -0700 ---------------------------------------------------------------------- docs/algorithms-clustering.md | 14 +++++++------- scripts/algorithms/Kmeans.dml | 14 +++++++------- 2 files changed, 14 insertions(+), 14 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/0a89676f/docs/algorithms-clustering.md ---------------------------------------------------------------------- diff --git a/docs/algorithms-clustering.md b/docs/algorithms-clustering.md index 0c91fa1..7554660 100644 --- a/docs/algorithms-clustering.md +++ b/docs/algorithms-clustering.md @@ -123,10 +123,10 @@ apart is a "false negative" etc. maxi=[int] tol=[double] samp=[int] - isY=[int] + isY=[boolean] Y=[file] fmt=[format] - verb=[int] + verb=[boolean] </div> <div data-lang="Spark" markdown="1"> $SPARK_HOME/bin/spark-submit --master yarn-cluster @@ -143,10 +143,10 @@ apart is a "false negative" etc. maxi=[int] tol=[double] samp=[int] - isY=[int] + isY=[boolean] Y=[file] fmt=[format] - verb=[int] + verb=[boolean] </div> </div> @@ -203,14 +203,14 @@ in the centroid initialization procedure available mapping of records to clusters (defined by the output centroids) -**isY**: (default: `0`) `0` = do not write matrix $Y$, `1` = write $Y$ +**isY**: (default: `FALSE`) Do not write matrix $Y$ **fmt**: (default: `"text"`) Matrix file output format, such as `text`, `mm`, or `csv`; see read/write functions in SystemML Language Reference for details. -**verb**: (default: `0`) `0` = do not print per-iteration statistics for -each run, `1` = print them (the "verbose" option) +**verb**: (default: `FALSE`) Do not print per-iteration statistics for +each run ### Arguments - K-Means Prediction http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/0a89676f/scripts/algorithms/Kmeans.dml ---------------------------------------------------------------------- diff --git a/scripts/algorithms/Kmeans.dml b/scripts/algorithms/Kmeans.dml index 85e13c5..386c85f 100644 --- a/scripts/algorithms/Kmeans.dml +++ b/scripts/algorithms/Kmeans.dml @@ -33,15 +33,15 @@ # tol Double 0.000001 Tolerance (epsilon) for WCSS change ratio # samp Int 50 Average number of records per centroid in data samples # C String "C.mtx" Location to store the output matrix with the centroids -# isY Int 0 0 = do not write Y, 1 = write Y +# isY Boolean FALSE do not write Y # Y String "Y.mtx" Location to store the mapping of records to centroids # fmt String "text" Matrix output format, usually "text" or "csv" -# verb Int 0 0 = do not print per-iteration stats, 1 = print them +# verb Boolean FALSE do not print per-iteration stats # ---------------------------------------------------------------------------- # # Example: # hadoop jar SystemML.jar -f Kmeans.dml -nvargs X=X.mtx k=5 C=centroids.mtx -# hadoop jar SystemML.jar -f Kmeans.dml -nvargs X=X.mtx k=5 runs=100 maxi=5000 tol=0.00000001 samp=20 C=centroids.mtx isY=1 Y=clusters.mtx verb=1 +# hadoop jar SystemML.jar -f Kmeans.dml -nvargs X=X.mtx k=5 runs=100 maxi=5000 tol=0.00000001 samp=20 C=centroids.mtx isY=TRUE Y=clusters.mtx verb=TRUE fileX = $X; fileY = ifdef ($Y, "Y.mtx"); @@ -51,8 +51,8 @@ num_centroids = $k; num_runs = ifdef ($runs, 10); # $runs=10; max_iter = ifdef ($maxi, 1000); # $maxi=1000; eps = ifdef ($tol, 0.000001); # $tol=0.000001; -is_write_Y = ifdef ($isY, 0); # $isY=0; -is_verbose = ifdef ($verb, 0); # $verb=0; +is_write_Y = ifdef ($isY, FALSE); # $isY=FALSE; +is_verbose = ifdef ($verb, FALSE); # $verb=FALSE; fmtCY = ifdef ($fmt, "text"); # $fmt="text"; avg_sample_size_per_centroid = ifdef ($samp, 50); # $samp=50; @@ -149,7 +149,7 @@ parfor (run_index in 1 : num_runs, check = 0) # Compute the current centroid-based within-cluster sum of squares (WCSS) wcss_old = wcss; wcss = sumXsq + sum (minD); - if (is_verbose == 1) { + if (is_verbose == TRUE) { if (iter_count == 0) { print ("Run " + run_index + ", At Start-Up: Centroid WCSS = " + wcss); } else { @@ -208,7 +208,7 @@ if (num_successful_runs > 0) { C = All_Centroids [(num_centroids * (best_index - 1) + 1) : (num_centroids * best_index), ]; print ("Writing out the best-WCSS centroids..."); write (C, fileC, format=fmtCY); - if (is_write_Y == 1) { + if (is_write_Y == TRUE) { print ("Writing out the best-WCSS cluster labels..."); D = -2 * (X %*% t(C)) + t(rowSums (C ^ 2)); P = (D <= rowMins (D));
