Xiangrui Meng created SPARK-10385: ------------------------------------- Summary: Bivariate statistics as UDAFs Key: SPARK-10385 URL: https://issues.apache.org/jira/browse/SPARK-10385 Project: Spark Issue Type: Umbrella Components: ML, SQL Reporter: Xiangrui Meng Assignee: Burak Yavuz
Similar to SPARK-10384, it would be nice to have bivariate statistics defined as UDAFs. This JIRA discuss general implementation and track subtasks. Bivariate statistics include: * continuous: covariance, Pearson's correlation, and Spearman's correlation * categorical: ?? If we define them as UDAFs, it would be flexible to use them with DataFrames, e.g., {code} df.groupBy("key").agg(corr("x", "y")) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org