David Wood created SPARK-14534:
----------------------------------
Summary: Should SparkContext.parallelize(List) take an Iterable
instead?
Key: SPARK-14534
URL: https://issues.apache.org/jira/browse/SPARK-14534
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 1.6.1
Reporter: David Wood
Priority: Minor
I am using MongoDB to read the DB and it provides an Iterable (and not a List)
to access the results. This is similar to the ResultSet in SQL and is done
this way so that you can process things row by row and not have to pull in a
potentially large DB all at once. It might be nice if parallelize(List) could
instead operate on an Iterable to allow a similar efficience. SInce a List is
an Iterable, this would would be backwards compatible. However, I'm new to
Spark so not sure if that might violate some other design point.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]