[ 
https://issues.apache.org/jira/browse/SPARK-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Tabladillo updated SPARK-21915:
------------------------------------
    Description: 
Error in PySpark example code
[https://github.com/apache/spark/blob/master/examples/src/main/python/ml/estimator_transformer_param_example.py]

The original Scala code says
println("Model 2 was fit using parameters: " + model2.parent.extractParamMap)

The parent is lr

There is no method for accessing parent as is done in Scala.

----

This code has been tested in Python, and returns values consistent with Scala


Proposing to call the lr variable instead of model1 or model2



----
This patch was tested with Spark 2.1.0 comparing the Scala and PySpark results. 
Pyspark returns nothing at present for those two print lines.

The output for model2 in PySpark should be

{Param(parent='LogisticRegression_4187be538f744d5a9090', name='tol', doc='the 
convergence tolerance for iterative algorithms (>= 0).'): 1e-06,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='elasticNetParam', 
doc='the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the 
penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.'): 0.0,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='predictionCol', 
doc='prediction column name.'): 'prediction',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='featuresCol', 
doc='features column name.'): 'features',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='labelCol', 
doc='label column name.'): 'label',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='probabilityCol', 
doc='Column name for predicted class conditional probabilities. Note: Not all 
models output well-calibrated probability estimates! These probabilities should 
be treated as confidences, not precise probabilities.'): 'myProbability',
Param(parent='LogisticRegression_4187be538f744d5a9090', 
name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column 
name.'): 'rawPrediction',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='family', doc='The 
name of family which is a description of the label distribution to be used in 
the model. Supported options: auto, binomial, multinomial'): 'auto',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='fitIntercept', 
doc='whether to fit an intercept term.'): True,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='threshold', 
doc='Threshold in binary classification prediction, in range [0, 1]. If 
threshold and thresholds are both set, they must match.e.g. if threshold is p, 
then thresholds must be equal to [1-p, p].'): 0.55,
Param(parent='LogisticRegression_4187be538f744d5a9090', 
name='aggregationDepth', doc='suggested depth for treeAggregate (>= 2).'): 2,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='maxIter', 
doc='max number of iterations (>= 0).'): 30,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='regParam', 
doc='regularization parameter (>= 0).'): 0.1,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='standardization', 
doc='whether to standardize the training features before fitting the model.'): 
True}

  was:
The original Scala code says
println("Model 2 was fit using parameters: " + model2.parent.extractParamMap)

The parent is lr

There is no method for accessing parent as is done in Scala.

----

This code has been tested in Python, and returns values consistent with Scala


Proposing to call the lr variable instead of model1 or model2



----
This patch was tested with Spark 2.1.0 comparing the Scala and PySpark results. 
Pyspark returns nothing at present for those two print lines.

The output for model2 in PySpark should be

{Param(parent='LogisticRegression_4187be538f744d5a9090', name='tol', doc='the 
convergence tolerance for iterative algorithms (>= 0).'): 1e-06,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='elasticNetParam', 
doc='the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the 
penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.'): 0.0,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='predictionCol', 
doc='prediction column name.'): 'prediction',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='featuresCol', 
doc='features column name.'): 'features',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='labelCol', 
doc='label column name.'): 'label',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='probabilityCol', 
doc='Column name for predicted class conditional probabilities. Note: Not all 
models output well-calibrated probability estimates! These probabilities should 
be treated as confidences, not precise probabilities.'): 'myProbability',
Param(parent='LogisticRegression_4187be538f744d5a9090', 
name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column 
name.'): 'rawPrediction',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='family', doc='The 
name of family which is a description of the label distribution to be used in 
the model. Supported options: auto, binomial, multinomial'): 'auto',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='fitIntercept', 
doc='whether to fit an intercept term.'): True,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='threshold', 
doc='Threshold in binary classification prediction, in range [0, 1]. If 
threshold and thresholds are both set, they must match.e.g. if threshold is p, 
then thresholds must be equal to [1-p, p].'): 0.55,
Param(parent='LogisticRegression_4187be538f744d5a9090', 
name='aggregationDepth', doc='suggested depth for treeAggregate (>= 2).'): 2,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='maxIter', 
doc='max number of iterations (>= 0).'): 30,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='regParam', 
doc='regularization parameter (>= 0).'): 0.1,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='standardization', 
doc='whether to standardize the training features before fitting the model.'): 
True}


> Model 1 and Model 2 ParamMaps Missing
> -------------------------------------
>
>                 Key: SPARK-21915
>                 URL: https://issues.apache.org/jira/browse/SPARK-21915
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, PySpark
>    Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1, 1.6.2, 1.6.3, 2.0.0, 
> 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.2.0
>            Reporter: Mark Tabladillo
>            Priority: Minor
>              Labels: easyfix
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Error in PySpark example code
> [https://github.com/apache/spark/blob/master/examples/src/main/python/ml/estimator_transformer_param_example.py]
> The original Scala code says
> println("Model 2 was fit using parameters: " + model2.parent.extractParamMap)
> The parent is lr
> There is no method for accessing parent as is done in Scala.
> ----
> This code has been tested in Python, and returns values consistent with Scala
> Proposing to call the lr variable instead of model1 or model2
> ----
> This patch was tested with Spark 2.1.0 comparing the Scala and PySpark 
> results. Pyspark returns nothing at present for those two print lines.
> The output for model2 in PySpark should be
> {Param(parent='LogisticRegression_4187be538f744d5a9090', name='tol', doc='the 
> convergence tolerance for iterative algorithms (>= 0).'): 1e-06,
> Param(parent='LogisticRegression_4187be538f744d5a9090', 
> name='elasticNetParam', doc='the ElasticNet mixing parameter, in range [0, 
> 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 
> penalty.'): 0.0,
> Param(parent='LogisticRegression_4187be538f744d5a9090', name='predictionCol', 
> doc='prediction column name.'): 'prediction',
> Param(parent='LogisticRegression_4187be538f744d5a9090', name='featuresCol', 
> doc='features column name.'): 'features',
> Param(parent='LogisticRegression_4187be538f744d5a9090', name='labelCol', 
> doc='label column name.'): 'label',
> Param(parent='LogisticRegression_4187be538f744d5a9090', 
> name='probabilityCol', doc='Column name for predicted class conditional 
> probabilities. Note: Not all models output well-calibrated probability 
> estimates! These probabilities should be treated as confidences, not precise 
> probabilities.'): 'myProbability',
> Param(parent='LogisticRegression_4187be538f744d5a9090', 
> name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column 
> name.'): 'rawPrediction',
> Param(parent='LogisticRegression_4187be538f744d5a9090', name='family', 
> doc='The name of family which is a description of the label distribution to 
> be used in the model. Supported options: auto, binomial, multinomial'): 
> 'auto',
> Param(parent='LogisticRegression_4187be538f744d5a9090', name='fitIntercept', 
> doc='whether to fit an intercept term.'): True,
> Param(parent='LogisticRegression_4187be538f744d5a9090', name='threshold', 
> doc='Threshold in binary classification prediction, in range [0, 1]. If 
> threshold and thresholds are both set, they must match.e.g. if threshold is 
> p, then thresholds must be equal to [1-p, p].'): 0.55,
> Param(parent='LogisticRegression_4187be538f744d5a9090', 
> name='aggregationDepth', doc='suggested depth for treeAggregate (>= 2).'): 2,
> Param(parent='LogisticRegression_4187be538f744d5a9090', name='maxIter', 
> doc='max number of iterations (>= 0).'): 30,
> Param(parent='LogisticRegression_4187be538f744d5a9090', name='regParam', 
> doc='regularization parameter (>= 0).'): 0.1,
> Param(parent='LogisticRegression_4187be538f744d5a9090', 
> name='standardization', doc='whether to standardize the training features 
> before fitting the model.'): True}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to