[ https://issues.apache.org/jira/browse/SPARK-23661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421348#comment-16421348 ]
Liang-Chi Hsieh commented on SPARK-23661: ----------------------------------------- For the implementation of {{Dataset.treeAggregate}}, I'm thinking if we need to support SQL tree aggregate for all cases. For example, {{RDD.treeAggregate}} can be seen as grouping without keys. This is the case tree aggregation can benefit. For grouping by keys, I'm wondering if it really performs much better than non tree aggregation. cc [~cloud_fan] > Implement treeAggregate on Dataset API > -------------------------------------- > > Key: SPARK-23661 > URL: https://issues.apache.org/jira/browse/SPARK-23661 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 2.4.0 > Reporter: Liang-Chi Hsieh > Priority: Major > > Many algorithms in MLlib are still not migrated their internal computing > workload from {{RDD}} to {{DataFrame}}. {{treeAggregate}} is one of obstacles > we need to address in order to see complete migration. > This ticket is opened to provide {{treeAggregate}} on Dataset API. For now > this should be a private API used by ML component. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org