I'm trying to write a JavaPairRDD to a downstream database using
saveAsNewAPIHadoopFile with a custom OutputFormat and the process is pretty
slow.
Is there a way to boost the concurrency of the save process? For example,
something like splitting the RDD into multiple smaller RDDs and using Java
Question on the input and output for ALS.train() and
MatrixFactorizationModel.predict().
My input is list of Ratings(user_id, product_id, rating) and my ratings are
one a scale of 1-5 (inclusive). When I compute predictions over the
superset of all (user_id, product_id) pairs, the ratings
less lambda, more features?
On Tue, Jun 10, 2014 at 4:59 PM, Sandeep Parikh sand...@clusterbeep.org
wrote:
Question on the input and output for ALS.train() and
MatrixFactorizationModel.predict().
My input is list of Ratings(user_id, product_id, rating) and my ratings
are
one a scale of 1
Chen
On Wed, May 28, 2014 at 6:27 AM, Sandeep Parikh
sand...@clusterbeep.orgwrote:
I've got a trained MatrixFactorizationModel via ALS.train(...) and now
I'm trying to use it to predict some ratings like so:
JavaRDDRating predictions = model.predict(usersProducts.rdd())
Where usersProducts
I've got a trained MatrixFactorizationModel via ALS.train(...) and now I'm
trying to use it to predict some ratings like so:
JavaRDDRating predictions = model.predict(usersProducts.rdd())
Where usersProducts is built from an existing Ratings dataset like so:
JavaPairRDDInteger,Integer