Bettadapura Srinath Sharma created SPARK-20802:
--------------------------------------------------

             Summary: kolmogorovSmirnovTest in pyspark.mllib.stat.Statistics 
throws net.razorvine.pickle.PickleException when input data is normally 
distributed (no error when data is not normally distributed)
                 Key: SPARK-20802
                 URL: https://issues.apache.org/jira/browse/SPARK-20802
             Project: Spark
          Issue Type: Bug
          Components: MLlib, PySpark
    Affects Versions: 2.1.1
         Environment: Linux version 4.4.14-smp
x86/fpu: Legacy x87 FPU detected.
using command line: 
bash-4.3$ ./bin/spark-submit ~/work/python/Features.py
bash-4.3$ pwd
/home/bsrsharma/spark-2.1.1-bin-hadoop2.7
export JAVA_HOME=/home/bsrsharma/jdk1.8.0_121

            Reporter: Bettadapura Srinath Sharma


In Scala,(correct behavior)
code:
testResult = Statistics.kolmogorovSmirnovTest(vecRDD, "norm", means(j), 
stdDev(j))
produces:
17/05/18 10:52:53 INFO FeatureLogger: Kolmogorov-Smirnov test summary:
degrees of freedom = 0 
statistic = 0.005495681749849268 
pValue = 0.9216108887428276 
No presumption against null hypothesis: Sample follows theoretical distribution.

in python (incorrect behavior):
the code:
testResult = Statistics.kolmogorovSmirnovTest(vecRDD, 'norm', numericMean[j], 
numericSD[j])

causes this error:
17/05/17 21:59:23 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 14)
net.razorvine.pickle.PickleException: expected zero arguments for construction 
of ClassDict (for numpy.dtype)
 





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to