[
https://issues.apache.org/jira/browse/SPARK-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090492#comment-14090492
]
Apache Spark commented on SPARK-2590:
-------------------------------------
User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/1853
> Add config property to disable incremental collection used in Thrift server
> ---------------------------------------------------------------------------
>
> Key: SPARK-2590
> URL: https://issues.apache.org/jira/browse/SPARK-2590
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Cheng Lian
> Assignee: Cheng Lian
> Priority: Blocker
>
> {{SparkSQLOperationManager}} uses {{RDD.toLocalIterator}} to collect the
> result set one partition at a time. This is useful to avoid OOM when the
> result is large, but introduces extra job scheduling costs as each partition
> is collected with a separate job. Users may want to disable this when the
> result set is expected to be small.
> *UPDATE* Incremental collection hurts performance because tasks of the last
> stage of the RDD DAG generated from the SQL query plan are executed
> sequentially. Thus we decided to disable it by default.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]