Yeah that's a bug, I can reproduce it. Can you open a JIRA? It works in Scala, so must be an issue with the Python wrapper. The serialized model is fine; it's loading it back.
I think it's because the MultilayerPerceptronParams extends HasSolver which defaults to 'auto', but doesn't seem to fully override it correctly, as it picks up this default which isn't valid for MLP. Huaxin maybe you have some insight? I think you have worked on this code recently. On Wed, Jul 8, 2020 at 4:05 AM Steve Taylor <steve.tay...@symphonyretailai.com> wrote: > > Hi, > > > > I’m not sure if this is the right place to raise this, if not hopefully you > can direct me to the right place. > > > > I believe I have discovered a bug when loading > MultilayerPerceptronClassificationModel in spark 3.0.0, scala 2.1.2 which I > have tested and can see is not there in at least Spark 2.4.3, Scala 2.11. > (I’m not sure if the Scala version is important). > > > > I am using pyspark on a databricks cluster and importing the library “from > pyspark.ml.classification import MultilayerPerceptronClassificationModel” > > > > When running model=MultilayerPerceptronClassificationModel.(“load”) and then > model. transform (df) I get the following error: IllegalArgumentException: > MultilayerPerceptronClassifier_8055d1368e78 parameter solver given invalid > value auto. > > > > > > This issue can be easily replicated by running the example given on the spark > documents: > http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier > > > > Then adding a save model, load model and transform statement as such: > > > > from pyspark.ml.classification import MultilayerPerceptronClassifier > > from pyspark.ml.evaluation import MulticlassClassificationEvaluator > > > > # Load training data > > data = spark.read.format("libsvm")\ > > .load("data/mllib/sample_multiclass_classification_data.txt") > > > > # Split the data into train and test > > splits = data.randomSplit([0.6, 0.4], 1234) > > train = splits[0] > > test = splits[1] > > > > # specify layers for the neural network: > > # input layer of size 4 (features), two intermediate of size 5 and 4 > > # and output of size 3 (classes) > > layers = [4, 5, 4, 3] > > > > # create the trainer and set its parameters > > trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, > blockSize=128, seed=1234) > > > > # train the model > > model = trainer.fit(train) > > > > # compute accuracy on the test set > > result = model.transform(test) > > predictionAndLabels = result.select("prediction", "label") > > evaluator = MulticlassClassificationEvaluator(metricName="accuracy") > > print("Test set accuracy = " + str(evaluator.evaluate(predictionAndLabels))) > > > > from pyspark.ml.classification import MultilayerPerceptronClassifier, > MultilayerPerceptronClassificationModel > > model.save(Save_location) > > model2. MultilayerPerceptronClassificationModel.load(Save_location) > > > > result_from_loaded = model2.transform(test) > > > > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org