[ 
https://issues.apache.org/jira/browse/SPARK-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8884:
---------------------------------
    Target Version/s: 1.5.0

> 1-sample Anderson-Darling Goodness-of-Fit test
> ----------------------------------------------
>
>                 Key: SPARK-8884
>                 URL: https://issues.apache.org/jira/browse/SPARK-8884
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Jose Cambronero
>            Priority: Minor
>
> We have implemented a 1-sample Anderson-Darling goodness-of-fit test to add 
> to the current hypothesis testing functionality. The current implementation 
> supports various distributions (normal, exponential, gumbel, logistic, and 
> weibull). However, users must provide distribution parameters for all except 
> normal/exponential (in which case they are estimated from the data). In 
> contrast to other tests, such as the Kolmogorov Smirnov test, we only support 
> specific distributions as the critical values depend on the distribution 
> being tested. 
> The distributed implementation of AD takes advantage of the fact that we can 
> calculate a portion of the statistic within each partition of a sorted data 
> set, independent of the global order of those observations. We can then carry 
> some additional information that allows us to adjust the final amounts once 
> we have collected 1 result per partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to