[jira] [Created] (SPARK-14534) Should SparkContext.parallelize(List) take an Iterable instead?

David Wood (JIRA) Mon, 11 Apr 2016 06:29:09 -0700

David Wood created SPARK-14534:
----------------------------------

             Summary: Should SparkContext.parallelize(List) take an Iterable 
instead?
                 Key: SPARK-14534
                 URL: https://issues.apache.org/jira/browse/SPARK-14534
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 1.6.1
            Reporter: David Wood
            Priority: Minor



I am using MongoDB to read the DB and it provides an Iterable (and not a List) 
to access the results.  This is similar to the ResultSet in SQL and is done 
this way so that you can process things row by row and not have to pull in a 
potentially large DB all at once.  It might be nice if parallelize(List) could 
instead operate on an Iterable to allow a similar efficience.   SInce a List is 
an Iterable, this would would be backwards compatible.  However, I'm new to 
Spark so not sure if that might violate some other design point.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-14534) Should SparkContext.parallelize(List) take an Iterable instead?

Reply via email to