[
https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-17020.
----------------------------------
Resolution: Incomplete
> Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data
> ------------------------------------------------------------------------------
>
> Key: SPARK-17020
> URL: https://issues.apache.org/jira/browse/SPARK-17020
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 1.6.1, 1.6.2, 2.0.0
> Reporter: Roi Reshef
> Priority: Major
> Labels: bulk-closed
> Attachments: dataframe_cache.PNG, rdd_cache.PNG
>
>
> Calling DataFrame's lazy val .rdd results with a new RDD with a poor
> distribution of partitions across the cluster. Moreover, any attempt to
> repartition this RDD further will fail.
> Attached are a screenshot of the original DataFrame on cache and the
> resulting RDD on cache.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]