increasing concurrency of saveAsNewAPIHadoopFile?

2014-06-19 Thread Sandeep Parikh
I'm trying to write a JavaPairRDD to a downstream database using saveAsNewAPIHadoopFile with a custom OutputFormat and the process is pretty slow. Is there a way to boost the concurrency of the save process? For example, something like splitting the RDD into multiple smaller RDDs and using Java

getting started with mllib.recommendation.ALS

2014-06-10 Thread Sandeep Parikh
Question on the input and output for ALS.train() and MatrixFactorizationModel.predict(). My input is list of Ratings(user_id, product_id, rating) and my ratings are one a scale of 1-5 (inclusive). When I compute predictions over the superset of all (user_id, product_id) pairs, the ratings

Re: getting started with mllib.recommendation.ALS

2014-06-10 Thread Sandeep Parikh
less lambda, more features? On Tue, Jun 10, 2014 at 4:59 PM, Sandeep Parikh sand...@clusterbeep.org wrote: Question on the input and output for ALS.train() and MatrixFactorizationModel.predict(). My input is list of Ratings(user_id, product_id, rating) and my ratings are one a scale of 1

Re: Java RDD structure for Matrix predict?

2014-05-28 Thread Sandeep Parikh
Chen On Wed, May 28, 2014 at 6:27 AM, Sandeep Parikh sand...@clusterbeep.orgwrote: I've got a trained MatrixFactorizationModel via ALS.train(...) and now I'm trying to use it to predict some ratings like so: JavaRDDRating predictions = model.predict(usersProducts.rdd()) Where usersProducts

Java RDD structure for Matrix predict?

2014-05-27 Thread Sandeep Parikh
I've got a trained MatrixFactorizationModel via ALS.train(...) and now I'm trying to use it to predict some ratings like so: JavaRDDRating predictions = model.predict(usersProducts.rdd()) Where usersProducts is built from an existing Ratings dataset like so: JavaPairRDDInteger,Integer