[ 
https://issues.apache.org/jira/browse/SPARK-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengbing li updated SPARK-2257:
--------------------------------

    Description: 
When I test ALS algorithm using netflix data, I find I cannot get the acurate 
results declared by the paper. The best  MSE value is 0.9066300038109709(RMSE 
0.952), which is worse than the paper's result. If I increase the number of 
features or the number of iterations, I will get a worse result. After I 
studing the paper and source code, I find a bug in the updateBlock function of 
ALS.

orgin code is:
    while (i < rank) {
        // ---
       fullXtX.data(i * rank + i) += lambda

        i += 1
      }

The code doesn't consider the number of products that one user rates. So this 
code should be modified:
    while (i < rank) {
 
        //ratingsNum(index) equals the number of products that a user rates
        fullXtX.data(i * rank + i) += lambda * ratingsNum(index)
        i += 1
      } 

After I modify code, the MSE value has been decreased, this is one test result
conditions:
val numIterations =20
val features = 30
val model = ALS.train(trainRatings,features, numIterations, 0.06)

result of modified version:
MSE: Double = 0.8472313396478773
RMSE: 0.92045


results of version of 1.0
MSE: Double = 1.2680743123043832
RMSE: 1.1261

In order to add the vector ratingsNum, I want to change the InLinkBlock 
structure as follows:
private[recommendation] case class InLinkBlock(elementIds: Array[Int], 
ratingsNum:Array[Int], ratingsForBlock: Array[Array[(Array[Int], 
Array[Double])]])
So I could calculte the vector ratingsNum in the function of makeInLinkBlock. 
This is the code I add in the makeInLinkBlock:

........... 
//added 
  val ratingsNum = new Array[Int](numUsers)
   ratings.map(r => ratingsNum(userIdToPos(r.user)) += 1)
//end of added
  InLinkBlock(userIds, ratingsNum, ratingsForBlock)
........


Is this solution reasonable??

  was:

When I test ALS algorithm using netflix data, I find I cannot get the acurate 
results declared by the paper. The best  MSE value is 0.9066300038109709(RMSE 
0.952), which is worse than the paper's result. If I increase the number of 
features or the number of iterations, I will get a worse result. After I 
studing the paper and source code, I find a bug in the updateBlock function of 
ALS.

orgin code is:
    while (i < rank) {
        // ---
       fullXtX.data(i * rank + i) += lambda

        i += 1
      }

The code doesn't consider the number of products that one user rates. So this 
code should be modified:
    while (i < rank) {
 
        //ratingsNum(index) equals the number of products that a user rates
        fullXtX.data(i * rank + i) += lambda * ratingsNum(index)
        i += 1
      } 

After I modify code, the MSE value has been improved, this is one test result
conditions:
val numIterations =20
val features = 30
val model = ALS.train(trainRatings,features, numIterations, 0.06)

result of modified version:
MSE: Double = 0.8472313396478773
RMSE: 0.92045


results of version of 1.0
MSE: Double = 1.2680743123043832
RMSE: 1.1261

In order to add the vector ratingsNum, I want to change the InLinkBlock 
structure as follows:
private[recommendation] case class InLinkBlock(elementIds: Array[Int], 
ratingsNum:Array[Int], ratingsForBlock: Array[Array[(Array[Int], 
Array[Double])]])
So I could calculte the vector ratingsNum in the function of makeInLinkBlock. 
This is the code I add in the makeInLinkBlock:

........... 
//added 
  val ratingsNum = new Array[Int](numUsers)
   ratings.map(r => ratingsNum(userIdToPos(r.user)) += 1)
//end of added
  InLinkBlock(userIds, ratingsNum, ratingsForBlock)
........


Is this solution reasonable??


> The algorithm of ALS in mlib lacks a parameter 
> -----------------------------------------------
>
>                 Key: SPARK-2257
>                 URL: https://issues.apache.org/jira/browse/SPARK-2257
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.0.0
>         Environment: spark 1.0
>            Reporter: zhengbing li
>              Labels: patch
>             Fix For: 1.1.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> When I test ALS algorithm using netflix data, I find I cannot get the acurate 
> results declared by the paper. The best  MSE value is 0.9066300038109709(RMSE 
> 0.952), which is worse than the paper's result. If I increase the number of 
> features or the number of iterations, I will get a worse result. After I 
> studing the paper and source code, I find a bug in the updateBlock function 
> of ALS.
> orgin code is:
>     while (i < rank) {
>         // ---
>        fullXtX.data(i * rank + i) += lambda
>         i += 1
>       }
> The code doesn't consider the number of products that one user rates. So this 
> code should be modified:
>     while (i < rank) {
>  
>         //ratingsNum(index) equals the number of products that a user rates
>         fullXtX.data(i * rank + i) += lambda * ratingsNum(index)
>         i += 1
>       } 
> After I modify code, the MSE value has been decreased, this is one test result
> conditions:
> val numIterations =20
> val features = 30
> val model = ALS.train(trainRatings,features, numIterations, 0.06)
> result of modified version:
> MSE: Double = 0.8472313396478773
> RMSE: 0.92045
> results of version of 1.0
> MSE: Double = 1.2680743123043832
> RMSE: 1.1261
> In order to add the vector ratingsNum, I want to change the InLinkBlock 
> structure as follows:
> private[recommendation] case class InLinkBlock(elementIds: Array[Int], 
> ratingsNum:Array[Int], ratingsForBlock: Array[Array[(Array[Int], 
> Array[Double])]])
> So I could calculte the vector ratingsNum in the function of makeInLinkBlock. 
> This is the code I add in the makeInLinkBlock:
> ........... 
> //added 
>   val ratingsNum = new Array[Int](numUsers)
>    ratings.map(r => ratingsNum(userIdToPos(r.user)) += 1)
> //end of added
>   InLinkBlock(userIds, ratingsNum, ratingsForBlock)
> ........
> Is this solution reasonable??



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to