[
https://issues.apache.org/jira/browse/SPARK-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph K. Bradley updated SPARK-8884:
-------------------------------------
Target Version/s: 1.6.0 (was: 1.5.0)
> 1-sample Anderson-Darling Goodness-of-Fit test
> ----------------------------------------------
>
> Key: SPARK-8884
> URL: https://issues.apache.org/jira/browse/SPARK-8884
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Jose Cambronero
> Priority: Minor
>
> We have implemented a 1-sample Anderson-Darling goodness-of-fit test to add
> to the current hypothesis testing functionality. The current implementation
> supports various distributions (normal, exponential, gumbel, logistic, and
> weibull). However, users must provide distribution parameters for all except
> normal/exponential (in which case they are estimated from the data). In
> contrast to other tests, such as the Kolmogorov Smirnov test, we only support
> specific distributions as the critical values depend on the distribution
> being tested.
> The distributed implementation of AD takes advantage of the fact that we can
> calculate a portion of the statistic within each partition of a sorted data
> set, independent of the global order of those observations. We can then carry
> some additional information that allows us to adjust the final amounts once
> we have collected 1 result per partition.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]