[ 
https://issues.apache.org/jira/browse/SPARK-17721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534375#comment-15534375
 ] 

Joseph K. Bradley commented on SPARK-17721:
-------------------------------------------

OK I did an audit, and this will not have affected any algorithms in 2.0 or 
before.  But it will affect sparse logistic regression in 2.1!  Thanks for 
finding this bug.

If users have called Matrix.multiply directly, then they could be affected.

> Erroneous computation in multiplication of transposed SparseMatrix with 
> SparseVector
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-17721
>                 URL: https://issues.apache.org/jira/browse/SPARK-17721
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, MLlib
>    Affects Versions: 1.4.1, 1.5.2, 1.6.2, 2.0.0
>         Environment: Verified on OS X with Spark 1.6.1 and on Databricks 
> running Spark 1.6.1
>            Reporter: Bjarne Fruergaard
>            Assignee: Bjarne Fruergaard
>            Priority: Critical
>              Labels: correctness
>             Fix For: 2.0.2, 2.1.0
>
>
> There is a bug in how a transposed SparseMatrix (isTransposed=true) does 
> multiplication with a SparseVector. The bug is present (for v. > 2.0.0) in 
> both org.apache.spark.mllib.linalg.BLAS (mllib) and 
> org.apache.spark.ml.linalg.BLAS (mllib-local) in the private gemv method with 
> signature:
> bq. gemv(alpha: Double, A: SparseMatrix, x: SparseVector, beta: Double, y: 
> DenseVector).
> This bug can be verified by running the following snippet in a Spark shell 
> (here using v1.6.1):
> {code:java}
> import com.holdenkarau.spark.testing.SharedSparkContext
> import org.apache.spark.mllib.linalg._
> val A = Matrices.dense(3, 2, Array[Double](0, 2, 1, 1, 2, 
> 0)).asInstanceOf[DenseMatrix].toSparse.transpose
> val b = Vectors.sparse(3, Seq[(Int, Double)]((1, 2), (2, 
> 1))).asInstanceOf[SparseVector]
> A.multiply(b)
> A.multiply(b.toDense)
> {code}
> The first multiply with the SparseMatrix returns the incorrect result:
> {code:java}
> org.apache.spark.mllib.linalg.DenseVector = [5.0,0.0]
> {code}
> whereas the correct result is returned by the second multiply:
> {code:java}
> org.apache.spark.mllib.linalg.DenseVector = [5.0,4.0]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to