Repository: spark
Updated Branches:
  refs/heads/branch-1.0 919ed3108 -> 92269f97c


[SPARK-1646] Micro-optimisation of ALS

This change replaces some Scala `for` and `foreach` constructs with `while` 
constructs.  There may be a slight performance gain on the order of 1-2% when 
training an ALS model.

I trained an ALS model on the Movielens 10M-rating dataset repeatedly both with 
and without these changes.  All 7 runs in both columns were done in a Scala 
`for` loop like this:

    for (iter <- 0 to 10) {
      val before = System.currentTimeMillis()
      val model = ALS.train(rats, 20, 10)
      val after = System.currentTimeMillis()
      println("%d ms".format(after-before))
      println("rmse %g".format(computeRmse(model, rats, numRatings)))
    }

The timings were done on a multiuser machine, and I stopped one set of timings 
after 7 had been completed.  It would be nice if somebody with dedicated 
hardware could confirm my timings.

    After           Before
    121980 ms       122041 ms
    117069 ms       117127 ms
    115332 ms       117523 ms
    115381 ms       117402 ms
    114635 ms       116550 ms
    114140 ms       114076 ms
    112993 ms       117200 ms

Ratios are about 1.0005, 1.0005, 1.019, 1.0175, 1.01671, 0.99944, and 1.03723.  
I therefore suspect these changes make for a slight performance gain on the 
order of 1-2%.

Author: Tor Myklebust <[email protected]>

Closes #568 from tmyklebu/alsopt and squashes the following commits:

5ded80f [Tor Myklebust] Fix style.
79595ff [Tor Myklebust] Fix style error.
4ef0313 [Tor Myklebust] Merge branch 'master' of github.com:apache/spark into 
alsopt
114fb74 [Tor Myklebust] Turn some 'for' loops into 'while' loops.
dcf583a [Tor Myklebust] Remove the partitioner member variable; instead, thread 
that needle everywhere it needs to go.
23d6f91 [Tor Myklebust] Stop making the partitioner configurable.
495784f [Tor Myklebust] Merge branch 'master' of https://github.com/apache/spark
674933a [Tor Myklebust] Fix style.
40edc23 [Tor Myklebust] Fix missing space.
f841345 [Tor Myklebust] Fix daft bug creating 'pairs', also for -> foreach.
5ec9e6c [Tor Myklebust] Clean a couple of things up using 'map'.
36a0f43 [Tor Myklebust] Make the partitioner private.
d872b09 [Tor Myklebust] Add negative id ALS test.
df27697 [Tor Myklebust] Support custom partitioners.  Currently we use the same 
partitioner for users and products.
c90b6d8 [Tor Myklebust] Scramble user and product ids before bucketing.
c774d7d [Tor Myklebust] Make the partitioner a member variable and use it 
instead of modding directly.

(cherry picked from commit 5c0cd5c1a594c181a3f7536639122ab7d97b271b)
Signed-off-by: Reynold Xin <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/92269f97
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/92269f97
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/92269f97

Branch: refs/heads/branch-1.0
Commit: 92269f97c1ad1a03f697c0a823610ada5aada3e9
Parents: 919ed31
Author: Tor Myklebust <[email protected]>
Authored: Tue Apr 29 22:04:34 2014 -0700
Committer: Reynold Xin <[email protected]>
Committed: Tue Apr 29 22:04:43 2014 -0700

----------------------------------------------------------------------
 .../apache/spark/mllib/recommendation/ALS.scala | 22 +++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/92269f97/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
index 2a77e1a..0cf9a7f 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
@@ -472,13 +472,15 @@ class ALS private (
     // Compute the XtX and Xy values for each user by adding products it rated 
in each product
     // block
     for (productBlock <- 0 until numBlocks) {
-      for (p <- 0 until blockFactors(productBlock).length) {
+      var p = 0
+      while (p < blockFactors(productBlock).length) {
         val x = wrapDoubleArray(blockFactors(productBlock)(p))
         tempXtX.fill(0.0)
         dspr(1.0, x, tempXtX)
         val (us, rs) = inLinkBlock.ratingsForBlock(productBlock)(p)
-        for (i <- 0 until us.length) {
-          if (implicitPrefs) {
+        if (implicitPrefs) {
+          var i = 0
+          while (i < us.length) {
             // Extension to the original paper to handle rs(i) < 0. confidence 
is a function
             // of |rs(i)| instead so that it is never negative:
             val confidence = 1 + alpha * abs(rs(i))
@@ -489,11 +491,17 @@ class ALS private (
             if (rs(i) > 0) {
               SimpleBlas.axpy(confidence, x, userXy(us(i)))
             }
-          } else {
+            i += 1
+          }
+        } else {
+          var i = 0
+          while (i < us.length) {
             userXtX(us(i)).addi(tempXtX)
             SimpleBlas.axpy(rs(i), x, userXy(us(i)))
+            i += 1
           }
         }
+        p += 1
       }
     }
 
@@ -502,7 +510,11 @@ class ALS private (
       // Compute the full XtX matrix from the lower-triangular part we got 
above
       fillFullMatrix(userXtX(index), fullXtX)
       // Add regularization
-      (0 until rank).foreach(i => fullXtX.data(i*rank + i) += lambda)
+      var i = 0
+      while (i < rank) {
+        fullXtX.data(i * rank + i) += lambda
+        i += 1
+      }
       // Solve the resulting matrix, which is symmetric and positive-definite
       if (implicitPrefs) {
         Solve.solvePositive(fullXtX.addi(YtY.get.value), userXy(index)).data

Reply via email to