[
https://issues.apache.org/jira/browse/SPARK-18781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741723#comment-15741723
]
Eyal Allweil commented on SPARK-18781:
--------------------------------------
I don't think remembering the decision after the first time is that useful,
because I suspect most people will only call the rdd prediction api once (after
assembling a rdd of all the predictions they want).
Here is a [possible pull
request|https://github.com/apache/spark/compare/master...eyala:SPARK-18781]
that preserves the existing behavior while allowing two flags, _MoreUsers_ and
_MoreProducts_ to be used to skip the count. I also added a test for this api
since I couldn't find one.
BTW - I'm a bit of a Scala newbie, so I hope the code I'm suggesting is fine.
> Allow MatrixFactorizationModel.predict to skip user/product approximation
> count
> -------------------------------------------------------------------------------
>
> Key: SPARK-18781
> URL: https://issues.apache.org/jira/browse/SPARK-18781
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Reporter: Eyal Allweil
> Priority: Minor
>
> When
> [MatrixFactorizationModel.predict|https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.html#predict(org.apache.spark.rdd.RDD)]
> is used, it first calculates an approximation count of the users and
> products in order to determine the most efficient way to proceed. In many
> cases, the answer to this question is fixed (typically there are more users
> than products by an order of magnitude) and this check is unnecessary. Adding
> a parameter to this predict method to allow choosing the implementation (and
> skipping the check) would be nice.
> It would be especially nice in development cycles when you are repeatedly
> tweaking your model and which pairs you're predicting for and this
> approximate count represents a meaningful portion of the time you wait for
> results.
> I can provide a pull request with this ability added that preserves the
> existing behavior.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]