Xiangrui Meng created SPARK-7407:
------------------------------------
Summary: Use uid and param name to identify a parameter instead of
the param object
Key: SPARK-7407
URL: https://issues.apache.org/jira/browse/SPARK-7407
Project: Spark
Issue Type: Improvement
Components: ML
Affects Versions: 1.4.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Transferring parameter values from one to another have been the pain point in
the ML pipeline implementation. Because we use the param object as the key in
the param map, we have to correctly copy them when making a copy of the
transformer, estimator, and models. This becomes complicated when
meta-algorithms are involved. For example, in cross validation:
{code}
val cv = new CrossValidator()
.setEstimator(lr)
.setEstimatorParamMaps(epm)
{code}
When we make a copy of `cv` with extra params that contain estimator params,
{code}
cv.copy(ParamMap(cv.numFolds -> 3, lr.maxIter -> 10))
{code}
we need to make a copy of the `lr` object as well and map `epm` to use the new
param keys from the old `lr`. This is quite error-prone, especially if the
estimator itself is another meta-algorithm.
Using uid + param name as the key in param maps and using the same uid in copy
(and between estimator/model pairs) would simplify the implementations. We
don't need to change the keys since the copied instance has the same id as the
original instance. And it is easier to find models from a fitted pipeline.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]