Eyal Allweil created SPARK-18781:
------------------------------------

             Summary: Allow MatrixFactorizationModel.predict to skip 
user/product approximation count
                 Key: SPARK-18781
                 URL: https://issues.apache.org/jira/browse/SPARK-18781
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
            Reporter: Eyal Allweil


When 
[MatrixFactorizationModel.predict|https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.html#predict(org.apache.spark.rdd.RDD)]
 is used, it first calculates an approximation count of the users and products 
in order to determine the most efficient way to proceed. In many cases, the 
answer to this question is fixed (typically there are more users than products 
by an order of magnitude) and this check is unnecessary. Adding a parameter to 
this predict method to allow choosing the implementation (and skipping the 
check) would be nice.

It would be especially nice in development cycles when you are repeatedly 
tweaking your model and which pairs you're predicting for and this approximate 
count represents a meaningful portion of the time you wait for results.

I can provide a pull request with this ability added that preserves the 
existing behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to