Xusen Yin created SPARK-13368:
---------------------------------
Summary: PySpark JavaModel fails to extract params from Spark side
automatically
Key: SPARK-13368
URL: https://issues.apache.org/jira/browse/SPARK-13368
Project: Spark
Issue Type: Bug
Components: PySpark
Reporter: Xusen Yin
JavaModel fails to extract params from Spark side automatically that causes
model.extractParamMap() is always empty. As shown in the example code below
copied from Spark Guide https://spark.apache.org/docs/latest/ml-guide.html:
{code}
# Prepare training data from a list of (label, features) tuples.
training = sqlContext.createDataFrame([
(1.0, Vectors.dense([0.0, 1.1, 0.1])),
(0.0, Vectors.dense([2.0, 1.0, -1.0])),
(0.0, Vectors.dense([2.0, 1.3, 1.0])),
(1.0, Vectors.dense([0.0, 1.2, -0.5]))], ["label", "features"])
# Create a LogisticRegression instance. This instance is an Estimator.
lr = LogisticRegression(maxIter=10, regParam=0.01)
# Print out the parameters, documentation, and any default values.
print "LogisticRegression parameters:\n" + lr.explainParams() + "\n"
# Learn a LogisticRegression model. This uses the parameters stored in lr.
model1 = lr.fit(training)
# Since model1 is a Model (i.e., a transformer produced by an Estimator),
# we can view the parameters it used during fit().
# This prints the parameter (name: value) pairs, where names are unique
# IDs for this LogisticRegression instance.
print "Model 1 was fit using parameters: "
print model1.extractParamMap()
{code}
The result of model1.extractParamMap() is {}.
Question is, should we provide the feature or not? If yes, we need either let
Model share same params with Estimator or adds a parent in Model and points to
its Estimator; if not, we should remove those lines from example code.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]