[jira] [Created] (SPARK-13639) Statistics.colStats(rdd).mean and variance should handle NaN in the input vectors

yuhao yang (JIRA) Wed, 02 Mar 2016 21:47:16 -0800

yuhao yang created SPARK-13639:
----------------------------------

             Summary: Statistics.colStats(rdd).mean and variance should handle 
NaN in the input vectors
                 Key: SPARK-13639
                 URL: https://issues.apache.org/jira/browse/SPARK-13639
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
            Reporter: yuhao yang
            Priority: Trivial



   val denseData = Array(
      Vectors.dense(3.8, 0.0, 1.8),
      Vectors.dense(1.7, 0.9, 0.0),
      Vectors.dense(Double.NaN, 0, 0.0)
    )

    val rdd = sc.parallelize(denseData)
    println(Statistics.colStats(rdd).mean)

[NaN,0.3,0.6]

This is just a proposal for discussion on how to handle the NaN value in the 
vectors. We can ignore the NaN value in the computation or just output NaN as 
it is now as a warning.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-13639) Statistics.colStats(rdd).mean and variance should handle NaN in the input vectors

Reply via email to