Jose Cambronero created SPARK-8598:
--------------------------------------

             Summary: Implementation of 1-sample, two-sided, Kolmogorov Smirnov 
Test for RDDs
                 Key: SPARK-8598
                 URL: https://issues.apache.org/jira/browse/SPARK-8598
             Project: Spark
          Issue Type: New Feature
          Components: MLlib
            Reporter: Jose Cambronero
            Priority: Minor


We have implemented a 1-sample, two-sided version of the Kolmogorov Smirnov 
test, which tests the null hypothesis that the sample comes from a given 
continuous distribution. We provide various functions to access the 
functionality: namely, a function that takes an RDD[Double] of the data and a 
lambda to calculate the CDF, a function that takes an RDD[Double] and an 
Iterator[(Double,Double,Double)] => Iterator[Double] which uses mapPartition to 
provide an optimized way to perform the calculation when the CDF calculation 
requires a non-serializable object (e.g. the apache math commons real 
distributions), and finally a function that takes an RDD[Double] and a String 
name of the theoretical distribution to be used. The appropriate result class 
has been added, as well as tests to the HypothesisTestSuite



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to