I do see the issue for centering sparse data. Actually, the centering is less
important than the scaling by the standard deviation. Not having unit
variance causes the convergence issues and long runtimes. 

RowMatrix will compute variance of a column?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Standard-preprocessing-scaling-tp6826p6849.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Reply via email to