[
https://issues.apache.org/jira/browse/SPARK-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-15660:
----------------------------------
Priority: Minor (was: Major)
Description:
In Spark-11490, `variance/stdev` are redefined as the sample variance/stdev
instead of population ones. This PR updates the comments to prevent users from
misunderstanding. This will update the following API docs.
-
http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.api.java.JavaDoubleRDD
-
http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.rdd.DoubleRDDFunctions
-
http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter
Also, this PR adds them popVariance and popStdev functions clearly.
was:
In Spark-11490, `variance/stdev` are redefined as the **sample**
`variance/stdev` instead of population ones.
This issue addresses the only remaining legacy in RDD. This may cause breaking
changes, but we had better be consistent in Spark 2.0 if possible.
{code}
scala> val ds = spark.createDataset(Seq(1.0, 2.0, 3.0))
ds: org.apache.spark.sql.Dataset[Double] = [value: double]
scala> ds.describe().show()
+-------+-----+
|summary|value|
+-------+-----+
| count| 3|
| mean| 2.0|
| stddev| 1.0|
| min| 1.0|
| max| 3.0|
+-------+-----+
scala> ds.rdd.stdev
res1: Double = 0.816496580927726
{code}
Issue Type: Improvement (was: Bug)
Summary: Update RDD `variance/stdev` description and add
popVariance/popStdev (was: RDD and Dataset should show the consistent value
for variance/stdev.)
> Update RDD `variance/stdev` description and add popVariance/popStdev
> --------------------------------------------------------------------
>
> Key: SPARK-15660
> URL: https://issues.apache.org/jira/browse/SPARK-15660
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Reporter: Dongjoon Hyun
> Priority: Minor
>
> In Spark-11490, `variance/stdev` are redefined as the sample variance/stdev
> instead of population ones. This PR updates the comments to prevent users
> from misunderstanding. This will update the following API docs.
> -
> http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.api.java.JavaDoubleRDD
> -
> http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.rdd.DoubleRDDFunctions
> -
> http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter
> Also, this PR adds them popVariance and popStdev functions clearly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]