Liang-Chi Hsieh created SPARK-18487:
---------------------------------------
Summary: Consume all elements for Dataset.show/take to avoid
memory leak
Key: SPARK-18487
URL: https://issues.apache.org/jira/browse/SPARK-18487
Project: Spark
Issue Type: Bug
Components: SQL
Reporter: Liang-Chi Hsieh
The methods such as Dataset.show and take use Limit (CollectLimitExec) which
leverages SparkPlan.executeTake to efficiently collect required number of
elements back to the driver.
However, under wholestage codege, we usually release resources after all
elements are consumed (e.g., HashAggregate). In this case, we will not release
the resources and cause memory leak with Dataset.show, for example.
We should consume all elements in the iterator to avoid memory leak.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]