Li Xian created SPARK-36843:
-------------------------------

             Summary: Add an iterator method to Dataset
                 Key: SPARK-36843
                 URL: https://issues.apache.org/jira/browse/SPARK-36843
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: Li Xian


The current org.apache.spark.sql.Dataset#toLocalIterator will submit multiple 
jobs for multiple partitions. 

In my case, I would like to collect all partition at once to save the job 
scheduling cost and also has an iterator to save the memory on deserialization 
(instead of deserialize all rows at once, I want only one row is deserialized 
during the iteration)

. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to