[jira] [Commented] (SPARK-10014) ML model broadcasts should be stored in private vars

Sameer Abhyankar (JIRA) Fri, 11 Sep 2015 06:28:57 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740786#comment-14740786
 ]


Sameer Abhyankar commented on SPARK-10014:
------------------------------------------

[~josephkb] [~mengxr] thanks for the feedback guys. I agree that since there is 
no clean way to lockdown the trait and it might be safer to repeat the 
broadcast code for the models.

As for the issue regarding not rebroadcasting on every predict: How about we 
store a hashcode of the pertinent weights in a private var when we broadcast. 
Then every time a predict is called, we rebroadcast only when the model has not 
been broadcast before OR when the hashcode of the pertinent weights have 
changed (for e.g hashcode of labels, pi & theta for the NaiveBayesModel). 
Thoughts?

> ML model broadcasts should be stored in private vars
> ----------------------------------------------------
>
>                 Key: SPARK-10014
>                 URL: https://issues.apache.org/jira/browse/SPARK-10014
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML, MLlib
>            Reporter: Joseph K. Bradley
>            Priority: Minor
>
> Multiple places in MLlib, we broadcast a model before prediction.  Since 
> prediction may be called many times, we should store the broadcast variable 
> in a private var so that we broadcast at most once.
> I'll link subtasks for each problem case I find.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-10014) ML model broadcasts should be stored in private vars

Reply via email to