[
https://issues.apache.org/jira/browse/SPARK-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740786#comment-14740786
]
Sameer Abhyankar commented on SPARK-10014:
------------------------------------------
[~josephkb] [~mengxr] thanks for the feedback guys. I agree that since there is
no clean way to lockdown the trait and it might be safer to repeat the
broadcast code for the models.
As for the issue regarding not rebroadcasting on every predict: How about we
store a hashcode of the pertinent weights in a private var when we broadcast.
Then every time a predict is called, we rebroadcast only when the model has not
been broadcast before OR when the hashcode of the pertinent weights have
changed (for e.g hashcode of labels, pi & theta for the NaiveBayesModel).
Thoughts?
> ML model broadcasts should be stored in private vars
> ----------------------------------------------------
>
> Key: SPARK-10014
> URL: https://issues.apache.org/jira/browse/SPARK-10014
> Project: Spark
> Issue Type: Umbrella
> Components: ML, MLlib
> Reporter: Joseph K. Bradley
> Priority: Minor
>
> Multiple places in MLlib, we broadcast a model before prediction. Since
> prediction may be called many times, we should store the broadcast variable
> in a private var so that we broadcast at most once.
> I'll link subtasks for each problem case I find.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]