Your understanding is correct: When used without centering (withMean = false), the 2 implementations are different: * R: normalize by RMS * MLlib: normalize by stddev With centering, they are the same.
It's hard to say which one is better a priori, but my guess is that most R users center their data. (Centering is nice to do, except on big data where it makes vectors dense.) Note that R does allow you to normalize by stddev without centering: https://stat.ethz.ch/R-manual/R-devel/library/base/html/scale.html Joseph On Tue, Jun 2, 2015 at 1:25 AM, RoyGaoVLIS <[email protected]> wrote: > Hi, > When I was trying to add test case for ML’s StandardScaler, I > found MLlib’s > StandardScaler’s output different from R with params(withMean false, > withScale true) > Because columns is divided by root-mean-square rather than standard > deviation in R, the scale function. > I’ m confused about Spark MLlib’s implementation. > AnyBody can give me a hand ? thx > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/about-Spark-MLlib-StandardScaler-s-Implementation-tp12554.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
