[
https://issues.apache.org/jira/browse/SPARK-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph K. Bradley resolved SPARK-6457.
--------------------------------------
Resolution: Fixed
Fix Version/s: 1.4.0
1.3.1
Fixed by [SPARK-6330]
> Error when calling Pyspark RandomForestModel.load
> -------------------------------------------------
>
> Key: SPARK-6457
> URL: https://issues.apache.org/jira/browse/SPARK-6457
> Project: Spark
> Issue Type: Bug
> Components: MLlib, PySpark
> Affects Versions: 1.3.0
> Reporter: Joseph K. Bradley
> Priority: Minor
> Fix For: 1.3.1, 1.4.0
>
>
> Reported by [https://github.com/catmonkeylee]:
> Summary: PySpark RandomForestModel.load fails in test script. It appears
> that the saved model file is empty.
> {quote}
> When I run the sample code in cluster mode, there is an error.
> Traceback (most recent call last):
> File "/data1/s/apps/spark-app/app/sample_rf.py", line 25, in
> sameModel = RandomForestModel.load(sc, model_path)
> File "/home/s/apps/spark/python/pyspark/mllib/util.py", line 254, in load
> java_model = cls.load_java(sc, path)
> File "/home/s/apps/spark/python/pyspark/mllib/util.py", line 250, in
> _load_java
> return java_obj.load(sc._jsc.sc(), path)
> File
> "/home/s/apps/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call
> File "/home/s/apps/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.mllib.tree.model.RandomForestModel.load.
> : java.lang.UnsupportedOperationException: empty collection
> at org.apache.spark.rdd.RDD.first(RDD.scala:1191)
> at org.apache.spark.mllib.util.Loader$.loadMetadata(modelSaveLoad.scala:125)
> at
> org.apache.spark.mllib.tree.model.RandomForestModel$.load(treeEnsembleModels.scala:65)
> at
> org.apache.spark.mllib.tree.model.RandomForestModel.load(treeEnsembleModels.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> {quote}
> {quote}
> I run the code on a spark cluster , spark version is 1.3.0
> The test code:
> ===================================
> from pyspark import SparkContext, SparkConf
> from pyspark.mllib.tree import RandomForest, RandomForestModel
> from pyspark.mllib.util import MLUtils
> conf = SparkConf().setAppName('LocalTest')
> sc = SparkContext(conf=conf)
> data = MLUtils.loadLibSVMFile(sc, 'data/mllib/sample_libsvm_data.txt')
> print data.count()
> (trainingData, testData) = data.randomSplit([0.7, 0.3])
> model = RandomForest.trainClassifier(trainingData, numClasses=2,
> categoricalFeaturesInfo={},
> numTrees=3, featureSubsetStrategy="auto",
> impurity='gini', maxDepth=4, maxBins=32)
> # Evaluate model on test instances and compute test error
> predictions = model.predict(testData.map(lambda x: x.features))
> labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions)
> testErr = labelsAndPredictions.filter(lambda (v, p): v != p).count() /
> float(testData.count())
> print('Test Error = ' + str(testErr))
> print('Learned classification forest model:')
> print(model.toDebugString())
> # Save and load model
> _model_path = "/home/s/apps/spark-app/data/myModelPath"
> model.save(sc, _model_path)
> sameModel = RandomForestModel.load(sc, _model_path)
> sc.stop()
> ===================
> run command:
> spark-submit --master spark://t0.q.net:7077 --executor-memory 1G sample_rf.py
> ======================
> Then I get this error :
> Traceback (most recent call last):
> File "/data1/s/apps/spark-app/app/sample_rf.py", line 25, in <module>
> sameModel = RandomForestModel.load(sc, _model_path)
> File "/home/s/apps/spark/python/pyspark/mllib/util.py", line 254, in load
> java_model = cls._load_java(sc, path)
> File "/home/s/apps/spark/python/pyspark/mllib/util.py", line 250, in
> _load_java
> return java_obj.load(sc._jsc.sc(), path)
> File
> "/home/s/apps/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call__
> File "/home/s/apps/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.mllib.tree.model.RandomForestModel.load.
> : java.lang.UnsupportedOperationException: empty collection
> at org.apache.spark.rdd.RDD.first(RDD.scala:1191)
> at org.apache.spark.mllib.util.Loader$.loadMetadata(modelSaveLoad.scala:125)
> at
> org.apache.spark.mllib.tree.model.RandomForestModel$.load(treeEnsembleModels.scala:65)
> at
> org.apache.spark.mllib.tree.model.RandomForestModel.load(treeEnsembleModels.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> at py4j.Gateway.invoke(Gateway.java:259)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:207)
> at java.lang.Thread.run(Thread.java:724)
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]