Bettadapura Srinath Sharma created SPARK-20802:
--------------------------------------------------
Summary: kolmogorovSmirnovTest in pyspark.mllib.stat.Statistics
throws net.razorvine.pickle.PickleException when input data is normally
distributed (no error when data is not normally distributed)
Key: SPARK-20802
URL: https://issues.apache.org/jira/browse/SPARK-20802
Project: Spark
Issue Type: Bug
Components: MLlib, PySpark
Affects Versions: 2.1.1
Environment: Linux version 4.4.14-smp
x86/fpu: Legacy x87 FPU detected.
using command line:
bash-4.3$ ./bin/spark-submit ~/work/python/Features.py
bash-4.3$ pwd
/home/bsrsharma/spark-2.1.1-bin-hadoop2.7
export JAVA_HOME=/home/bsrsharma/jdk1.8.0_121
Reporter: Bettadapura Srinath Sharma
In Scala,(correct behavior)
code:
testResult = Statistics.kolmogorovSmirnovTest(vecRDD, "norm", means(j),
stdDev(j))
produces:
17/05/18 10:52:53 INFO FeatureLogger: Kolmogorov-Smirnov test summary:
degrees of freedom = 0
statistic = 0.005495681749849268
pValue = 0.9216108887428276
No presumption against null hypothesis: Sample follows theoretical distribution.
in python (incorrect behavior):
the code:
testResult = Statistics.kolmogorovSmirnovTest(vecRDD, 'norm', numericMean[j],
numericSD[j])
causes this error:
17/05/17 21:59:23 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 14)
net.razorvine.pickle.PickleException: expected zero arguments for construction
of ClassDict (for numpy.dtype)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]