[ https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15021729#comment-15021729 ]
Yanbo Liang edited comment on SPARK-11918 at 11/23/15 8:44 AM: --------------------------------------------------------------- Further more, I use the breeze library to train the model by local normal equation method. {code} import sqlCtx.implicits._ import org.apache.spark.mllib.linalg.Vector import breeze.linalg.DenseMatrix import breeze.linalg._ val df = MLUtils.loadLibSVMFile(sqlCtx.sparkContext, "/Users/yanboliang/data/trunk/spark/data/mllib/sample_libsvm_data.txt").toDF() val features = df.select(col("features")).map { r => r.getAs[Vector](0) }.collect().flatMap { v => v.toArray } val labelArray = df.select(col("label")).map { r => r.getDouble(0) }.collect() val Xt = new DenseMatrix[Double](692, 100, features) val X = Xt.t val y = new DenseMatrix[Double](100, 1, labelArray) val XtXi = inv(Xt * X) val XtY = Xt * y val coefs = XtXi * XtY println(coefs.toString) {code} It also throw exception like: {code} breeze.linalg.MatrixSingularException: at breeze.linalg.inv$$anon$1.apply(inv.scala:36) at breeze.linalg.inv$$anon$1.apply(inv.scala:19) at breeze.generic.UFunc$class.apply(UFunc.scala:48) at breeze.linalg.inv$.apply(inv.scala:17) {code} breeze.linalg.inv is also call netlib lapack library which is the same as Spark. Tracking the breeze code, we can get this exception is thrown at here (https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/functions/inv.scala#L33) also caused by the underneath lapack error. was (Author: yanboliang): Further more, I use the breeze library to train the model by local normal equation method. {code} import sqlCtx.implicits._ import org.apache.spark.mllib.linalg.Vector import breeze.linalg.DenseMatrix import breeze.linalg._ val df = MLUtils.loadLibSVMFile(sqlCtx.sparkContext, "/Users/yanboliang/data/trunk/spark/data/mllib/sample_libsvm_data.txt").toDF() val features = df.select(col("features")).map { r => r.getAs[Vector](0) }.collect().flatMap { v => v.toArray } val labelArray = df.select(col("label")).map { r => r.getDouble(0) }.collect() val Xt = new DenseMatrix[Double](692, 100, features) val X = Xt.t val y = new DenseMatrix[Double](100, 1, labelArray) val XtXi = inv(Xt * X) val XtY = Xt * y val coefs = XtXi * XtY println(coefs.toString) {code} It also throw exception like: {code} breeze.linalg.MatrixSingularException: at breeze.linalg.inv$$anon$1.apply(inv.scala:36) at breeze.linalg.inv$$anon$1.apply(inv.scala:19) at breeze.generic.UFunc$class.apply(UFunc.scala:48) at breeze.linalg.inv$.apply(inv.scala:17) {code} The breeze.linalg.inv is also call netlib LAPACK package which is the same library as Spark. Tracking the breeze code, we can get this exception is thrown at here (https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/functions/inv.scala#L33) which is also caused by the underneath lapack error. > WLS can not resolve some kinds of equation > ------------------------------------------ > > Key: SPARK-11918 > URL: https://issues.apache.org/jira/browse/SPARK-11918 > Project: Spark > Issue Type: Bug > Components: ML > Reporter: Yanbo Liang > Attachments: R_GLM_output > > > Weighted Least Squares (WLS) is one of the optimization method for solve > Linear Regression (when #feature < 4096). But if the dataset is very ill > condition (such as 0-1 based label used for classification and the equation > is underdetermined), the WLS failed (But "l-bfgs" can train and get the > model). The failure is caused by the underneath lapack library return error > value when Cholesky decomposition. > This issue is easy to reproduce, you can train a LinearRegressionModel by > "normal" solver with the example > dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt). > The following is the exception: > {code} > assertion failed: lapack.dpotrs returned 1. > java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1. > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42) > at > org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117) > at > org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180) > at > org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67) > at org.apache.spark.ml.Predictor.fit(Predictor.scala:90) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org