[jira] [Updated] (SPARK-14534) Should SparkContext.parallelize(List) take an Iterable instead?

Hyukjin Kwon (JIRA) Mon, 20 May 2019 21:49:59 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-14534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon updated SPARK-14534:
---------------------------------
    Labels: bulk-closed  (was: )

> Should SparkContext.parallelize(List) take an Iterable instead?
> ---------------------------------------------------------------
>
>                 Key: SPARK-14534
>                 URL: https://issues.apache.org/jira/browse/SPARK-14534
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.6.1
>            Reporter: David Wood
>            Priority: Minor
>              Labels: bulk-closed
>
> I am using MongoDB to read the DB and it provides an Iterable (and not a 
> List) to access the results.  This is similar to the ResultSet in SQL and is 
> done this way so that you can process things row by row and not have to pull 
> in a potentially large DB all at once.  It might be nice if parallelize(List) 
> could instead operate on an Iterable to allow a similar efficience.   SInce a 
> List is an Iterable, this would would be backwards compatible.  However, I'm 
> new to Spark so not sure if that might violate some other design point.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-14534) Should SparkContext.parallelize(List) take an Iterable instead?

Reply via email to