As you've pointed out, Rating requires user and item ids in Int form. So
you will need to map String user ids to integers.

See this thread for example:
https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAJgQjQ9GhGqpg1=hvxpfrs+59elfj9f7knhp8nyqnh1ut_6...@mail.gmail.com%3E
.

There is a DeveloperApi method
in org.apache.spark.ml.recommendation.ALS that takes Rating with generic
type (can be String) for user id and item id. However that is a little more
involved, and for larger scale data will be a lot less efficient.

Something like this for example:

import org.apache.spark.ml.recommendation.ALS
import org.apache.spark.ml.recommendation.ALS.Rating

val conf = new SparkConf().setAppName("ALSWithStringID").setMaster("local[4]")
val sc = new SparkContext(conf)
// Name,Value1,Value2.
val rdd = sc.parallelize(Seq(
  Rating[String]("foo", "1", 4.0f),
  Rating[String]("foo", "2", 2.0f),
  Rating[String]("bar", "1", 5.0f),
  Rating[String]("bar", "3", 1.0f)
))
val (userFactors, itemFactors) = ALS.train(rdd)


As you can see, you just get the factor RDDs back, and if you want an
ALSModel you will have to construct it yourself.


On Sun, 6 Mar 2016 at 18:23 Shishir Anshuman <shishiranshu...@gmail.com>
wrote:

> I am new to apache Spark, and I want to implement the Alternating Least
> Squares algorithm. The data set is stored in a csv file in the format:
> *Name,Value1,Value2*.
>
> When I read the csv file, I get
> *java.lang.NumberFormatException.forInputString* error because the Rating
> class needs the parameters in the format: *(user: Int, product: Int,
> rating: Double)* and the first column of my file contains *Name*.
>
> Please suggest me a way to overcome this issue.
>

Reply via email to