[jira] [Commented] (SPARK-18781) Allow MatrixFactorizationModel.predict to skip user/product approximation count

Sean Owen (JIRA) Mon, 12 Dec 2016 05:15:33 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741856#comment-15741856
 ]


Sean Owen commented on SPARK-18781:
-----------------------------------

Amortized over a bunch of records, I don't see that this overhead would be 
significant? I don't think it's worth overloading the API with this flag that 
is useful only in the rare case that a) you know a priori something about the 
sizes, b) know about this method, c) have many tiny batches but for some reason 
can't batch them together.

> Allow MatrixFactorizationModel.predict to skip user/product approximation 
> count
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-18781
>                 URL: https://issues.apache.org/jira/browse/SPARK-18781
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Eyal Allweil
>            Priority: Minor
>
> When 
> [MatrixFactorizationModel.predict|https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.html#predict(org.apache.spark.rdd.RDD)]
>  is used, it first calculates an approximation count of the users and 
> products in order to determine the most efficient way to proceed. In many 
> cases, the answer to this question is fixed (typically there are more users 
> than products by an order of magnitude) and this check is unnecessary. Adding 
> a parameter to this predict method to allow choosing the implementation (and 
> skipping the check) would be nice.
> It would be especially nice in development cycles when you are repeatedly 
> tweaking your model and which pairs you're predicting for and this 
> approximate count represents a meaningful portion of the time you wait for 
> results.
> I can provide a pull request with this ability added that preserves the 
> existing behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-18781) Allow MatrixFactorizationModel.predict to skip user/product approximation count

Reply via email to