[
https://issues.apache.org/jira/browse/SPARK-14534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-14534:
---------------------------------
Labels: bulk-closed (was: )
> Should SparkContext.parallelize(List) take an Iterable instead?
> ---------------------------------------------------------------
>
> Key: SPARK-14534
> URL: https://issues.apache.org/jira/browse/SPARK-14534
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 1.6.1
> Reporter: David Wood
> Priority: Minor
> Labels: bulk-closed
>
> I am using MongoDB to read the DB and it provides an Iterable (and not a
> List) to access the results. This is similar to the ResultSet in SQL and is
> done this way so that you can process things row by row and not have to pull
> in a potentially large DB all at once. It might be nice if parallelize(List)
> could instead operate on an Iterable to allow a similar efficience. SInce a
> List is an Iterable, this would would be backwards compatible. However, I'm
> new to Spark so not sure if that might violate some other design point.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]