[
https://issues.apache.org/jira/browse/SPARK-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-13639.
----------------------------------
Resolution: Incomplete
> Statistics.colStats(rdd).mean and variance should handle NaN in the input
> vectors
> ---------------------------------------------------------------------------------
>
> Key: SPARK-13639
> URL: https://issues.apache.org/jira/browse/SPARK-13639
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Reporter: yuhao yang
> Priority: Trivial
> Labels: bulk-closed
>
> val denseData = Array(
> Vectors.dense(3.8, 0.0, 1.8),
> Vectors.dense(1.7, 0.9, 0.0),
> Vectors.dense(Double.NaN, 0, 0.0)
> )
> val rdd = sc.parallelize(denseData)
> println(Statistics.colStats(rdd).mean)
> [NaN,0.3,0.6]
> This is just a proposal for discussion on how to handle the NaN value in the
> vectors. We can ignore the NaN value in the computation or just output NaN as
> it is now as a warning.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]