Repository: systemml Updated Branches: refs/heads/gh-pages b5a248a4b -> e184e966e
[SYSTEMML-1929] Update Spark parameters in sparkDML.sh and docs Closes #670. Project: http://git-wip-us.apache.org/repos/asf/systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/e184e966 Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/e184e966 Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/e184e966 Branch: refs/heads/gh-pages Commit: e184e966e2103062e5381c11a48c9e2bb53b5687 Parents: b5a248a Author: Glenn Weidner <[email protected]> Authored: Sat Oct 7 17:22:18 2017 -0700 Committer: Glenn Weidner <[email protected]> Committed: Sat Oct 7 17:22:18 2017 -0700 ---------------------------------------------------------------------- algorithms-classification.md | 88 +++++++++++++++---------------- algorithms-clustering.md | 28 +++++----- algorithms-descriptive-statistics.md | 28 +++++----- algorithms-matrix-factorization.md | 36 ++++++------- algorithms-regression.md | 72 ++++++++++++------------- algorithms-survival-analysis.md | 32 +++++------ spark-batch-mode.md | 8 +-- spark-mlcontext-programming-guide.md | 4 +- 8 files changed, 148 insertions(+), 148 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/systemml/blob/e184e966/algorithms-classification.md ---------------------------------------------------------------------- diff --git a/algorithms-classification.md b/algorithms-classification.md index 1895103..62e40e7 100644 --- a/algorithms-classification.md +++ b/algorithms-classification.md @@ -160,9 +160,9 @@ val prediction = model.transform(X_test_df) fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f MultiLogReg.dml -config SystemML-config.xml @@ -331,9 +331,9 @@ prediction.show() Log=/user/ml/log.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f MultiLogReg.dml -config SystemML-config.xml @@ -527,9 +527,9 @@ val model = svm.fit(X_train_df) fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f l2-svm.dml -config SystemML-config.xml @@ -574,9 +574,9 @@ val prediction = model.transform(X_test_df) fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f l2-svm-predict.dml -config SystemML-config.xml @@ -658,9 +658,9 @@ more details on the Python API. Log=/user/ml/Log.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f l2-svm.dml -config SystemML-config.xml @@ -692,9 +692,9 @@ more details on the Python API. confusion=/user/ml/confusion.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f l2-svm-predict.dml -config SystemML-config.xml @@ -797,9 +797,9 @@ val model = svm.fit(X_train_df) fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f m-svm.dml -config SystemML-config.xml @@ -844,9 +844,9 @@ val prediction = model.transform(X_test_df) fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f m-svm-predict.dml -config SystemML-config.xml @@ -1009,9 +1009,9 @@ prediction.show() Log=/user/ml/Log.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f m-svm.dml -config SystemML-config.xml @@ -1043,9 +1043,9 @@ prediction.show() confusion=/user/ml/confusion.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f m-svm-predict.dml -config SystemML-config.xml @@ -1148,9 +1148,9 @@ val model = nb.fit(X_train_df) fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f naive-bayes.dml -config SystemML-config.xml @@ -1193,9 +1193,9 @@ val prediction = model.transform(X_test_df) probabilities=[file] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f naive-bayes-predict.dml -config SystemML-config.xml @@ -1284,9 +1284,9 @@ metrics.f1_score(newsgroups_test.target, pred, average='weighted') accuracy=/user/ml/accuracy.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f naive-bayes.dml -config SystemML-config.xml @@ -1316,9 +1316,9 @@ metrics.f1_score(newsgroups_test.target, pred, average='weighted') confusion=/user/ml/confusion.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f naive-bayes-predict.dml -config SystemML-config.xml @@ -1415,9 +1415,9 @@ implementation is well-suited to handle large-scale data and builds a fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f decision-tree.dml -config SystemML-config.xml @@ -1453,9 +1453,9 @@ implementation is well-suited to handle large-scale data and builds a fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f decision-tree-predict.dml -config SystemML-config.xml @@ -1553,9 +1553,9 @@ SystemML Language Reference for details. fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f decision-tree.dml -config SystemML-config.xml @@ -1588,9 +1588,9 @@ SystemML Language Reference for details. fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f decision-tree-predict.dml -config SystemML-config.xml @@ -1823,9 +1823,9 @@ for classification in parallel. fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f random-forest.dml -config SystemML-config.xml @@ -1866,9 +1866,9 @@ for classification in parallel. fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f random-forest-predict.dml -config SystemML-config.xml @@ -1989,9 +1989,9 @@ SystemML Language Reference for details. fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f random-forest.dml -config SystemML-config.xml @@ -2027,9 +2027,9 @@ To compute predictions: fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f random-forest-predict.dml -config SystemML-config.xml http://git-wip-us.apache.org/repos/asf/systemml/blob/e184e966/algorithms-clustering.md ---------------------------------------------------------------------- diff --git a/algorithms-clustering.md b/algorithms-clustering.md index 7554660..358a53a 100644 --- a/algorithms-clustering.md +++ b/algorithms-clustering.md @@ -129,9 +129,9 @@ apart is a "false negative" etc. verb=[boolean] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Kmeans.dml -config SystemML-config.xml @@ -163,9 +163,9 @@ apart is a "false negative" etc. O=[file] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Kmeans-predict.dml -config SystemML-config.xml @@ -255,9 +255,9 @@ standard output fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Kmeans.dml -config SystemML-config.xml @@ -284,9 +284,9 @@ standard output verb=1 </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Kmeans.dml -config SystemML-config.xml @@ -317,9 +317,9 @@ To predict Y given X and C: O=/user/ml/stats.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Kmeans-predict.dml -config SystemML-config.xml @@ -343,9 +343,9 @@ given X and C: O=/user/ml/stats.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Kmeans-predict.dml -config SystemML-config.xml @@ -368,9 +368,9 @@ labels prY: O=/user/ml/stats.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Kmeans-predict.dml -config SystemML-config.xml http://git-wip-us.apache.org/repos/asf/systemml/blob/e184e966/algorithms-descriptive-statistics.md ---------------------------------------------------------------------- diff --git a/algorithms-descriptive-statistics.md b/algorithms-descriptive-statistics.md index f45ffae..1c86368 100644 --- a/algorithms-descriptive-statistics.md +++ b/algorithms-descriptive-statistics.md @@ -125,9 +125,9 @@ to compute the mean of a categorical attribute like âHair Colorâ. STATS=<file> </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Univar-Stats.dml -config SystemML-config.xml @@ -164,9 +164,9 @@ be stored. The format of the output matrix is defined by STATS=/user/ml/stats.mtx </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Univar-Stats.dml -config SystemML-config.xml @@ -585,9 +585,9 @@ attributes like âHair Colorâ. OUTDIR=<directory> </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f bivar-stats.dml -config SystemML-config.xml @@ -654,9 +654,9 @@ are defined in [**Table 2**](algorithms-descriptive-statistics.html#table2). OUTDIR=/user/ml/stats.mtx </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f bivar-stats.dml -config SystemML-config.xml @@ -1147,9 +1147,9 @@ becomes reversed and amplified (from $+0.1$ to $-0.5$) if we ignore the months. fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f stratstats.dml -config SystemML-config.xml @@ -1355,9 +1355,9 @@ SystemML Language Reference for details. fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f stratstats.dml -config SystemML-config.xml @@ -1383,9 +1383,9 @@ SystemML Language Reference for details. O=/user/ml/Out.mtx </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f stratstats.dml -config SystemML-config.xml http://git-wip-us.apache.org/repos/asf/systemml/blob/e184e966/algorithms-matrix-factorization.md ---------------------------------------------------------------------- diff --git a/algorithms-matrix-factorization.md b/algorithms-matrix-factorization.md index 8777130..b559cb5 100644 --- a/algorithms-matrix-factorization.md +++ b/algorithms-matrix-factorization.md @@ -56,9 +56,9 @@ top-$K$ (for a given value of $K$) principal components. OUTPUT=<file> </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f PCA.dml -config SystemML-config.xml @@ -119,9 +119,9 @@ SystemML Language Reference for details. OUTPUT=/user/ml/pca_output/ </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f PCA.dml -config SystemML-config.xml @@ -149,9 +149,9 @@ SystemML Language Reference for details. OUTPUT=/user/ml/test_output.mtx </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f PCA.dml -config SystemML-config.xml @@ -257,9 +257,9 @@ problems. fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f ALS.dml -config SystemML-config.xml @@ -291,9 +291,9 @@ problems. fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f ALS_predict.dml -config SystemML-config.xml @@ -322,9 +322,9 @@ problems. fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f ALS_topk_predict.dml -config SystemML-config.xml @@ -431,9 +431,9 @@ SystemML Language Reference for details. fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f ALS.dml -config SystemML-config.xml @@ -467,9 +467,9 @@ To compute predicted ratings for a given list of users and items: fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f ALS_predict.dml -config SystemML-config.xml @@ -501,9 +501,9 @@ predicted ratings for a given list of users: fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f ALS_topk_predict.dml -config SystemML-config.xml http://git-wip-us.apache.org/repos/asf/systemml/blob/e184e966/algorithms-regression.md ---------------------------------------------------------------------- diff --git a/algorithms-regression.md b/algorithms-regression.md index df2ad3e..18640b8 100644 --- a/algorithms-regression.md +++ b/algorithms-regression.md @@ -102,9 +102,9 @@ y_test = lr.fit(df_train) fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f LinearRegDS.dml -config SystemML-config.xml @@ -147,9 +147,9 @@ y_test = lr.fit(df_train) fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f LinearRegCG.dml -config SystemML-config.xml @@ -254,9 +254,9 @@ print("Residual sum of squares: %.2f" % np.mean((regr.predict(diabetes_X_test) - reg=1.0 </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f LinearRegDS.dml -config SystemML-config.xml @@ -311,9 +311,9 @@ print("Residual sum of squares: %.2f" % np.mean((regr.predict(diabetes_X_test) - Log=/user/ml/log.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f LinearRegCG.dml -config SystemML-config.xml @@ -552,9 +552,9 @@ lowest AIC is computed. fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f StepLinearRegDS.dml -config SystemML-config.xml @@ -623,9 +623,9 @@ SystemML Language Reference for details. fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f StepLinearRegDS.dml -config SystemML-config.xml @@ -755,9 +755,9 @@ distributions and link functions, see below for details. mii=[int] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f GLM.dml -config SystemML-config.xml @@ -893,9 +893,9 @@ if no maximum limit provided Log=/user/ml/log.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f GLM.dml -config SystemML-config.xml @@ -1230,9 +1230,9 @@ distribution family is supported (see below for details). fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f StepGLM.dml -config SystemML-config.xml @@ -1335,9 +1335,9 @@ SystemML Language Reference for details. fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f StepGLM.dml -config SystemML-config.xml @@ -1481,9 +1481,9 @@ this step outside the scope of `GLM-predict.dml` for now. fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f GLM-predict.dml -config SystemML-config.xml @@ -1620,9 +1620,9 @@ unknown (which sets it to `1.0`). O=/user/ml/stats.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f GLM-predict.dml -config SystemML-config.xml @@ -1656,9 +1656,9 @@ unknown (which sets it to `1.0`). fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f GLM-predict.dml -config SystemML-config.xml @@ -1690,9 +1690,9 @@ unknown (which sets it to `1.0`). O=/user/ml/stats.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f GLM-predict.dml -config SystemML-config.xml @@ -1725,9 +1725,9 @@ unknown (which sets it to `1.0`). O=/user/ml/stats.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f GLM-predict.dml -config SystemML-config.xml @@ -1758,9 +1758,9 @@ unknown (which sets it to `1.0`). O=/user/ml/stats.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f GLM-predict.dml -config SystemML-config.xml @@ -1793,9 +1793,9 @@ unknown (which sets it to `1.0`). O=/user/ml/stats.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f GLM-predict.dml -config SystemML-config.xml @@ -1832,9 +1832,9 @@ unknown (which sets it to `1.0`). O=/user/ml/stats.csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f GLM-predict.dml -config SystemML-config.xml http://git-wip-us.apache.org/repos/asf/systemml/blob/e184e966/algorithms-survival-analysis.md ---------------------------------------------------------------------- diff --git a/algorithms-survival-analysis.md b/algorithms-survival-analysis.md index 239ab08..943d4d7 100644 --- a/algorithms-survival-analysis.md +++ b/algorithms-survival-analysis.md @@ -57,9 +57,9 @@ censored and uncensored survival times. fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f KM.dml -config SystemML-config.xml @@ -152,9 +152,9 @@ SystemML Language Reference for details. fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f KM.dml -config SystemML-config.xml @@ -189,9 +189,9 @@ SystemML Language Reference for details. fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f KM.dml -config SystemML-config.xml @@ -461,9 +461,9 @@ may be categorical (ordinal or nominal) as well as continuous-valued. fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Cox.dml -config SystemML-config.xml @@ -503,9 +503,9 @@ may be categorical (ordinal or nominal) as well as continuous-valued. fmt=[format] </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Cox-predict.dml -config SystemML-config.xml @@ -612,9 +612,9 @@ SystemML Language Reference for details. fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Cox.dml -config SystemML-config.xml @@ -651,9 +651,9 @@ SystemML Language Reference for details. fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Cox.dml -config SystemML-config.xml @@ -691,9 +691,9 @@ SystemML Language Reference for details. fmt=csv </div> <div data-lang="Spark" markdown="1"> - $SPARK_HOME/bin/spark-submit --master yarn-cluster + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster --conf spark.driver.maxResultSize=0 - --conf spark.akka.frameSize=128 SystemML.jar -f Cox-predict.dml -config SystemML-config.xml http://git-wip-us.apache.org/repos/asf/systemml/blob/e184e966/spark-batch-mode.md ---------------------------------------------------------------------- diff --git a/spark-batch-mode.md b/spark-batch-mode.md index 7f8f4c0..349f17c 100644 --- a/spark-batch-mode.md +++ b/spark-batch-mode.md @@ -41,7 +41,7 @@ mode in more depth. # Spark Batch Mode Invocation Syntax -SystemML can be invoked in Hadoop Batch mode using the following syntax: +SystemML can be invoked in Spark Batch mode using the following syntax: spark-submit SystemML.jar [-? | -help | -f <filename>] (-config <config_filename>) ([-args | -nvargs] <args-list>) @@ -63,7 +63,7 @@ to be deprecated. All the primary algorithm scripts included with SystemML use n # Execution modes SystemML works seamlessly with all Spark execution modes, including *local* (`--master local[*]`), -*yarn client* (`--master yarn-client`), *yarn cluster* (`--master yarn-cluster`), *etc*. More +*yarn client* (`--master yarn --deploy-mode client`), *yarn cluster* (`--master yarn --deploy-mode cluster`), *etc*. More information on Spark cluster execution modes can be found on the [official Spark cluster deployment documentation](https://spark.apache.org/docs/latest/cluster-overview.html). *Note* that Spark can be easily run on a laptop in local mode using the `--master local[*]` described @@ -71,8 +71,8 @@ above, which SystemML supports. # Recommended Spark Configuration Settings -For best performance, we recommend setting the following flags when running SystemML with Spark: -`--conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128`. +For best performance, we recommend setting the following configuration value when running SystemML with Spark: +`--conf spark.driver.maxResultSize=0`. # Examples http://git-wip-us.apache.org/repos/asf/systemml/blob/e184e966/spark-mlcontext-programming-guide.md ---------------------------------------------------------------------- diff --git a/spark-mlcontext-programming-guide.md b/spark-mlcontext-programming-guide.md index e935c65..63e48be 100644 --- a/spark-mlcontext-programming-guide.md +++ b/spark-mlcontext-programming-guide.md @@ -2814,5 +2814,5 @@ plt.title('PNMF Training Loss') # Recommended Spark Configuration Settings -For best performance, we recommend setting the following flags when running SystemML with Spark: -`--conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128`. +For best performance, we recommend setting the following configuration value when running SystemML with Spark: +`--conf spark.driver.maxResultSize=0`.
