Li Xian created SPARK-36843:
-------------------------------
Summary: Add an iterator method to Dataset
Key: SPARK-36843
URL: https://issues.apache.org/jira/browse/SPARK-36843
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.3.0
Reporter: Li Xian
The current org.apache.spark.sql.Dataset#toLocalIterator will submit multiple
jobs for multiple partitions.
In my case, I would like to collect all partition at once to save the job
scheduling cost and also has an iterator to save the memory on deserialization
(instead of deserialize all rows at once, I want only one row is deserialized
during the iteration)
.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]