[ 
https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193886#comment-15193886
 ] 

Xusen Yin commented on SPARK-11136:
-----------------------------------

This is a good point. Actually in our settings now, the new KMeans only uses 
the model itself (i.e. the array of cluster centers) without its parameters. 
E.g.

{code}
if (isSet(initialModel)) {
  require($(initialModel).parentModel.clusterCenters.length == $(k), 
"mismatched cluster count")
  require(rdd.first().size == $(initialModel).clusterCenters.head.size, 
"mismatched dimension")
  algo.setInitialModel($(initialModel).parentModel)
}
{code}

But I think you're right. We should also extend the parameters in some 
scenarios. IMHO, the parameter overriding order should be (initialModel 
parameter < default parameter < user-set parameter). What do you think about it?

> Warm-start support for ML estimator
> -----------------------------------
>
>                 Key: SPARK-11136
>                 URL: https://issues.apache.org/jira/browse/SPARK-11136
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Xusen Yin
>            Priority: Minor
>
> The current implementation of Estimator does not support warm-start fitting, 
> i.e. estimator.fit(data, params, partialModel). But first we need to add 
> warm-start for all ML estimators. This is an umbrella JIRA to add support for 
> the warm-start estimator. 
> Treat model as a special parameter, passing it through ParamMap. e.g. val 
> partialModel: Param[Option[M]] = new Param(...). In the case of model 
> existing, we use it to warm-start, else we start the training process from 
> the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to