[
https://issues.apache.org/jira/browse/SPARK-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiangrui Meng updated SPARK-8598:
---------------------------------
Assignee: Jose Cambronero
> Implementation of 1-sample, two-sided, Kolmogorov Smirnov Test for RDDs
> -----------------------------------------------------------------------
>
> Key: SPARK-8598
> URL: https://issues.apache.org/jira/browse/SPARK-8598
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Jose Cambronero
> Assignee: Jose Cambronero
> Priority: Minor
>
> We have implemented a 1-sample, two-sided version of the Kolmogorov Smirnov
> test, which tests the null hypothesis that the sample comes from a given
> continuous distribution. We provide various functions to access the
> functionality: namely, a function that takes an RDD[Double] of the data and a
> lambda to calculate the CDF, a function that takes an RDD[Double] and an
> Iterator[(Double,Double,Double)] => Iterator[Double] which uses mapPartition
> to provide an optimized way to perform the calculation when the CDF
> calculation requires a non-serializable object (e.g. the apache math commons
> real distributions), and finally a function that takes an RDD[Double] and a
> String name of the theoretical distribution to be used. The appropriate
> result class has been added, as well as tests to the HypothesisTestSuite
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]