I do see the issue for centering sparse data. Actually, the centering is less important than the scaling by the standard deviation. Not having unit variance causes the convergence issues and long runtimes.
RowMatrix will compute variance of a column? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Standard-preprocessing-scaling-tp6826p6849.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.