[
https://issues.apache.org/jira/browse/SPARK-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sidharth Rajeev updated SPARK-13508:
------------------------------------
Comment: was deleted
(was: In the Summary statistics , refer to the MultivariateStatisticalSummary
for details on the API.
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.stat.MultivariateStatisticalSummary;
import org.apache.spark.mllib.stat.Statistics;
JavaSparkContext jsc = ...
JavaRDD<Vector> mat = ... // an RDD of Vectors
// Compute column summary statistics.
MultivariateStatisticalSummary summary = Statistics.colStats(mat.rdd());
System.out.println(summary.mean()); // a dense vector containing the mean value
for each column
System.out.println(summary.variance()); // column-wise variance
System.out.println(summary.numNonzeros()); // number of nonzeros in each column
I mean to say that we can include summary.stdev() as another feature
)
> For direct retrival of Standard Deviation for Anlaytics
> -------------------------------------------------------
>
> Key: SPARK-13508
> URL: https://issues.apache.org/jira/browse/SPARK-13508
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Environment: Operating System:ubuntu
> Software : Spark
> Harware Specification:Normal
> Reporter: Sidharth Rajeev
> Priority: Minor
> Labels: easyfix, newbie
> Original Estimate: 0.05h
> Remaining Estimate: 0.05h
>
> As part of easying up the analytical capabilities, I would like to add the
> standard deviation as such. As of now variance is directly available. Its
> square root will give the standard deviation. But as direct functionality it
> will save the time of the data analysts/scientists.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]