Hossein Falaki created SPARK-17790:
--------------------------------------
Summary: Support for parallelizing/creating DataFrame on data
larger than 2GB
Key: SPARK-17790
URL: https://issues.apache.org/jira/browse/SPARK-17790
Project: Spark
Issue Type: Story
Components: SparkR
Affects Versions: 2.0.1
Reporter: Hossein Falaki
This issue is a more specific version of SPARK-17762.
Supporting larger than 2GB arguments is more general and arguably harder to do
because the limit exists both in R and JVM (because we receive data as a
ByteArray). However, to support parallalizing R data.frames that are larger
than 2GB we can do what PySpark does.
PySpark uses files to transfer bulk data between Python and JVM. It has worked
well for the large community of Spark Python users.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]