This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 5a48eb8 [SPARK-34415][ML] Python example 5a48eb8 is described below commit 5a48eb8d00faee3a7c8f023c0699296e22edb893 Author: Phillip Henry <phillhe...@gmail.com> AuthorDate: Sun Feb 28 17:01:13 2021 -0600 [SPARK-34415][ML] Python example Missing Python example file for [SPARK-34415][ML] Randomization in hyperparameter optimization (https://github.com/apache/spark/pull/31535) ### What changes were proposed in this pull request? For some reason (probably me being silly) a examples/src/main/python/ml/model_selection_random_hyperparameters_example.py was not pushed in a previous PR. This PR restores that file. ### Why are the changes needed? A single file (examples/src/main/python/ml/model_selection_random_hyperparameters_example.py) that should have been pushed as part of SPARK-34415 but was not. This was causing Lint errors as highlighted by dongjoon-hyun. Consequently, srowen asked for a new PR. ### Does this PR introduce _any_ user-facing change? No, it merely restores a file that was overlook in SPARK-34415. ### How was this patch tested? By running: `bin/spark-submit examples/src/main/python/ml/model_selection_random_hyperparameters_example.py` Closes #31687 from PhillHenry/SPARK-34415_model_selection_random_hyperparameters_example. Authored-by: Phillip Henry <phillhe...@gmail.com> Signed-off-by: Sean Owen <sro...@gmail.com> --- ...del_selection_random_hyperparameters_example.py | 66 ++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/examples/src/main/python/ml/model_selection_random_hyperparameters_example.py b/examples/src/main/python/ml/model_selection_random_hyperparameters_example.py new file mode 100644 index 0000000..b436341 --- /dev/null +++ b/examples/src/main/python/ml/model_selection_random_hyperparameters_example.py @@ -0,0 +1,66 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +This example uses random hyperparameters to perform model selection. +Run with: + + bin/spark-submit examples/src/main/python/ml/model_selection_random_hyperparameters_example.py +""" +# $example on$ +from pyspark.ml.evaluation import RegressionEvaluator +from pyspark.ml.regression import LinearRegression +from pyspark.ml.tuning import ParamRandomBuilder, CrossValidator +# $example off$ +from pyspark.sql import SparkSession + +if __name__ == "__main__": + spark = SparkSession \ + .builder \ + .appName("TrainValidationSplit") \ + .getOrCreate() + + # $example on$ + data = spark.read.format("libsvm") \ + .load("data/mllib/sample_linear_regression_data.txt") + + lr = LinearRegression(maxIter=10) + + # We sample the regularization parameter logarithmically over the range [0.01, 1.0]. + # This means that values around 0.01, 0.1 and 1.0 are roughly equally likely. + # Note that both parameters must be greater than zero as otherwise we'll get an infinity. + # We sample the the ElasticNet mixing parameter uniformly over the range [0, 1] + # Note that in real life, you'd choose more than the 5 samples we see below. + hyperparameters = ParamRandomBuilder() \ + .addLog10Random(lr.regParam, 0.01, 1.0, 5) \ + .addRandom(lr.elasticNetParam, 0.0, 1.0, 5) \ + .addGrid(lr.fitIntercept, [False, True]) \ + .build() + + cv = CrossValidator(estimator=lr, + estimatorParamMaps=hyperparameters, + evaluator=RegressionEvaluator(), + numFolds=2) + + model = cv.fit(data) + bestModel = model.bestModel + print("Optimal model has regParam = {}, elasticNetParam = {}, fitIntercept = {}" + .format(bestModel.getRegParam(), bestModel.getElasticNetParam(), + bestModel.getFitIntercept())) + + # $example off$ + spark.stop() --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org