Qiping Li created SPARK-3920:
--------------------------------
Summary: Add option to support aggregation using treeAggregate in
decision tree
Key: SPARK-3920
URL: https://issues.apache.org/jira/browse/SPARK-3920
Project: Spark
Issue Type: Improvement
Components: MLlib
Reporter: Qiping Li
Fix For: 1.2.0
In [SPARK-3366|https://issues.apache.org/jira/browse/SPARK-3366], we used
distribute aggregation to aggregate node stats, which can save computation and
communication time when the shuffle size is very large. But experiments have
shown that if shuffle size is not large enough(e.g, shallow trees), this will
cause some performance loss(greater than 20% in some cases). We should support
both options for aggregation so that user can choose a proper one based on
their needs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]