[
https://issues.apache.org/jira/browse/SPARK-20077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239846#comment-16239846
]
Teng Peng commented on SPARK-20077:
-----------------------------------
[~srowen] On this pagehttps://spark.apache.org/docs/latest/ml-statistics.html,
we have Pearson and Spearman coefficients. Just want to make sure: Maybe we
need something other than this?
Correlation computes the correlation matrix for the input Dataset of Vectors
using the specified method. The output will be a DataFrame that contains the
correlation matrix of the column of vectors.
import org.apache.spark.ml.linalg.{Matrix, Vectors}
import org.apache.spark.ml.stat.Correlation
import org.apache.spark.sql.Row
val data = Seq(
Vectors.sparse(4, Seq((0, 1.0), (3, -2.0))),
Vectors.dense(4.0, 5.0, 0.0, 3.0),
Vectors.dense(6.0, 7.0, 0.0, 8.0),
Vectors.sparse(4, Seq((0, 9.0), (3, 1.0)))
)
val df = data.map(Tuple1.apply).toDF("features")
val Row(coeff1: Matrix) = Correlation.corr(df, "features").head
println("Pearson correlation matrix:\n" + coeff1.toString)
val Row(coeff2: Matrix) = Correlation.corr(df, "features", "spearman").head
println("Spearman correlation matrix:\n" + coeff2.toString)
> Documentation for ml.stats.Correlation
> --------------------------------------
>
> Key: SPARK-20077
> URL: https://issues.apache.org/jira/browse/SPARK-20077
> Project: Spark
> Issue Type: Sub-task
> Components: ML
> Affects Versions: 2.1.0
> Reporter: Timothy Hunter
> Priority: Minor
>
> Now that (Pearson) correlations are available in spark.ml, we need to write
> some documentation to go along with this feature. It can simply be looking at
> the unit tests for example right now.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]