Lucas Partridge created SPARK-42825:
---------------------------------------
Summary: setParams() only sets explicitly named params. Is this
intentional or a bug?
Key: SPARK-42825
URL: https://issues.apache.org/jira/browse/SPARK-42825
Project: Spark
Issue Type: Question
Components: ML, PySpark
Affects Versions: 3.3.2
Reporter: Lucas Partridge
The Python signature/docstring of the setParams() method for the estimators and
transformers under pyspark.ml imply that if you don't set any of the named
params then they will be reset to their default values.
Example from
[https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.clustering.GaussianMixture.html#pyspark.ml.clustering.GaussianMixture.setParams]
:
{{{{}}}}
{code:java}
setParams(self, \*, featuresCol="features", predictionCol="prediction", k=2,
probabilityCol="probability", tol=0.01, maxIter=100, seed=None,
aggregationDepth=2, weightCol=None){code}
In the extreme this would imply that if you called setParams() with no args
then _all_ the params would be reset to their default values.
But what actually happens is that _only_ the params passed in the call get
changed; the values of any other params aren't affected. So if you call
setParams() with no args then _no_ params get changed!
So is this behavior by design? I guess it is from the name of the method. But
it is counter-intuitive from its docstring. So if this behavior is intentional
then perhaps the default docstring should make this explicit by saying
something like:
"Sets the named params. The values of other params are not affected."
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]