Re: Wrong initial bias in GraphX SVDPlusPlus?

2015-04-06 Thread Sean Owen
See now: https://issues.apache.org/jira/browse/SPARK-6710

On Mon, Apr 6, 2015 at 4:27 AM, Reynold Xin r...@databricks.com wrote:
 Adding Jianping Wang to the thread, since he contributed the SVDPlusPlus
 implementaiton.

 Jianping,

 Can you take a look at this message? Thanks.


 On Fri, Apr 3, 2015 at 8:41 AM, Michael Malak 
 michaelma...@yahoo.com.invalid wrote:

 I believe that in the initialization portion of GraphX SVDPlusPluS, the
 initialization of biases is incorrect. Specifically, in line

 https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96
 instead of
 (vd._1, vd._2, msg.get._2 / msg.get._1, 1.0 / scala.math.sqrt(msg.get._1))
 it should be
 (vd._1, vd._2, msg.get._2 / msg.get._1 - u, 1.0 /
 scala.math.sqrt(msg.get._1))

 That is, the biases bu and bi (both represented as the third component of
 the Tuple4[] above, depending on whether the vertex is a user or an item),
 described in equation (1) of the Koren paper, are supposed to be small
 offsets to the mean (represented by the variable u, signifying the Greek
 letter mu) to account for peculiarities of individual users and items.

 Initializing these biases to wrong values should theoretically not matter
 given enough iterations of the algorithm, but some quick empirical testing
 shows it has trouble converging at all, even after many orders of magnitude
 additional iterations.

 This perhaps could be the source of previously reported trouble with
 SVDPlusPlus.

 http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-SVDPlusPlus-problem-td12885.html

 If after a day, no one tells me I'm crazy here, I'll go ahead and create a
 Jira ticket.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Wrong initial bias in GraphX SVDPlusPlus?

2015-04-05 Thread Reynold Xin
Adding Jianping Wang to the thread, since he contributed the SVDPlusPlus
implementaiton.

Jianping,

Can you take a look at this message? Thanks.


On Fri, Apr 3, 2015 at 8:41 AM, Michael Malak 
michaelma...@yahoo.com.invalid wrote:

 I believe that in the initialization portion of GraphX SVDPlusPluS, the
 initialization of biases is incorrect. Specifically, in line

 https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96
 instead of
 (vd._1, vd._2, msg.get._2 / msg.get._1, 1.0 / scala.math.sqrt(msg.get._1))
 it should be
 (vd._1, vd._2, msg.get._2 / msg.get._1 - u, 1.0 /
 scala.math.sqrt(msg.get._1))

 That is, the biases bu and bi (both represented as the third component of
 the Tuple4[] above, depending on whether the vertex is a user or an item),
 described in equation (1) of the Koren paper, are supposed to be small
 offsets to the mean (represented by the variable u, signifying the Greek
 letter mu) to account for peculiarities of individual users and items.

 Initializing these biases to wrong values should theoretically not matter
 given enough iterations of the algorithm, but some quick empirical testing
 shows it has trouble converging at all, even after many orders of magnitude
 additional iterations.

 This perhaps could be the source of previously reported trouble with
 SVDPlusPlus.

 http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-SVDPlusPlus-problem-td12885.html

 If after a day, no one tells me I'm crazy here, I'll go ahead and create a
 Jira ticket.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Wrong initial bias in GraphX SVDPlusPlus?

2015-04-03 Thread Michael Malak
I believe that in the initialization portion of GraphX SVDPlusPluS, the 
initialization of biases is incorrect. Specifically, in line 
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96
 
instead of 
(vd._1, vd._2, msg.get._2 / msg.get._1, 1.0 / scala.math.sqrt(msg.get._1)) 
it should be 
(vd._1, vd._2, msg.get._2 / msg.get._1 - u, 1.0 / scala.math.sqrt(msg.get._1)) 

That is, the biases bu and bi (both represented as the third component of the 
Tuple4[] above, depending on whether the vertex is a user or an item), 
described in equation (1) of the Koren paper, are supposed to be small offsets 
to the mean (represented by the variable u, signifying the Greek letter mu) to 
account for peculiarities of individual users and items. 

Initializing these biases to wrong values should theoretically not matter given 
enough iterations of the algorithm, but some quick empirical testing shows it 
has trouble converging at all, even after many orders of magnitude additional 
iterations. 

This perhaps could be the source of previously reported trouble with 
SVDPlusPlus. 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-SVDPlusPlus-problem-td12885.html
 

If after a day, no one tells me I'm crazy here, I'll go ahead and create a Jira 
ticket. 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org