[
https://issues.apache.org/jira/browse/SPARK-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shivaram Venkataraman resolved SPARK-8277.
------------------------------------------
Resolution: Fixed
Assignee: Maciej Szymkiewicz
Fix Version/s: 1.6.0
Resolved by https://github.com/apache/spark/pull/9099
> SparkR createDataFrame is slow
> ------------------------------
>
> Key: SPARK-8277
> URL: https://issues.apache.org/jira/browse/SPARK-8277
> Project: Spark
> Issue Type: Bug
> Components: SparkR
> Affects Versions: 1.4.0
> Reporter: Shivaram Venkataraman
> Assignee: Maciej Szymkiewicz
> Fix For: 1.6.0
>
>
> For example calling `createDataFrame` on the data from
> http://s3-us-west-2.amazonaws.com/sparkr-data/flights.csv takes a really long
> time
> This is mainly because we try to convert a DataFrame to a List in order to
> parallelize it by rows and the conversion from DF to list is very slow for
> large data frames.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]