[
https://issues.apache.org/jira/browse/SPARK-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell resolved SPARK-1328.
------------------------------------
Resolution: Fixed
> Current implementation of Standard Deviation in MLUtils may cause
> catastrophic cancellation, and loss precision.
> ----------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-1328
> URL: https://issues.apache.org/jira/browse/SPARK-1328
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Affects Versions: 0.9.0
> Reporter: Xusen Yin
> Assignee: Xusen Yin
> Labels: MLLib,, statistics, vector
> Fix For: 1.0.0
>
>
> Standard Deviation (SD) is used for dataset normalization, which is useful in
> the training process of Lasso, etc. Current implementation of SD is using the
> second-order expectations equation E^2( x )-E(x^2), which is not a stable
> algorithm facing with floating point computing.
> Instead of that, the first-order equation performs better.
> Moreover, MLutils is not a right place to hold standard statistics methods,
> It is more suitable that put it in the VectorRDDFunctions. Some other
> affected machine learning algorithms should also be refined.
--
This message was sent by Atlassian JIRA
(v6.2#6252)