[jira] [Commented] (SPARK-23661) Implement treeAggregate on Dataset API

Liang-Chi Hsieh (JIRA) Sat, 31 Mar 2018 07:45:12 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-23661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421348#comment-16421348
 ]


Liang-Chi Hsieh commented on SPARK-23661:
-----------------------------------------

For the implementation of {{Dataset.treeAggregate}}, I'm thinking if we need to 
support SQL tree aggregate for all cases. For example, {{RDD.treeAggregate}} 
can be seen as grouping without keys. This is the case tree aggregation can 
benefit. For grouping by keys, I'm wondering if it really performs much better 
than non tree aggregation.

cc [~cloud_fan]

> Implement treeAggregate on Dataset API
> --------------------------------------
>
>                 Key: SPARK-23661
>                 URL: https://issues.apache.org/jira/browse/SPARK-23661
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Liang-Chi Hsieh
>            Priority: Major
>
> Many algorithms in MLlib are still not migrated their internal computing 
> workload from {{RDD}} to {{DataFrame}}. {{treeAggregate}} is one of obstacles 
> we need to address in order to see complete migration.
> This ticket is opened to provide {{treeAggregate}} on Dataset API. For now 
> this should be a private API used by ML component.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23661) Implement treeAggregate on Dataset API

Reply via email to