Sometimes for this case, I will just standardize without centerization. I still get good result.
Sent from my Google Nexus 5 On May 28, 2014 7:03 PM, "Xiangrui Meng" <men...@gmail.com> wrote: > RowMatrix has a method to compute column summary statistics. There is > a trade-off here because centering may densify the data. A utility > function that centers data would be useful for dense datasets. > -Xiangrui > > On Wed, May 28, 2014 at 5:03 AM, dataginjaninja > <rickett.stepha...@gmail.com> wrote: > > I searched on this, but didn't find anything general so I apologize if > this > > has been addressed. > > > > Many algorithms (SGD, SVM...) either will not converge or will run > forever > > if the data is not scaled. Sci-kit has preprocessing > > < > http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html > > > > that will subtract the mean and divide by standard dev. Of course there > are > > a few options with it as well. > > > > Is there something in the works for this? > > > > > > > > -- > > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Standard-preprocessing-scaling-tp6826.html > > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. >