Dewey Dunnington created SEDONA-723:
---------------------------------------

             Summary: Add Arrow write format
                 Key: SEDONA-723
                 URL: https://issues.apache.org/jira/browse/SEDONA-723
             Project: Apache Sedona
          Issue Type: Improvement
            Reporter: Dewey Dunnington


In SEDONA-660, SEDONA-714, and SEDONA-717, we wired up the ArrowSerializer from 
SparkConnect to accelerate transfer between the JVM and Python on the driver. 
For queries whose results are arbitrarily large or unknown at the time of 
issuing the query, this can result in out-of-memory and it would be helpful to 
have an escape hatch. This is also a useful way for Sedona users to build 
services on top of Sedona (e.g., by returning the URLs to the written Arrow 
files as described in 
https://arrow.apache.org/blog/2025/01/10/arrow-result-transfer/ ).

This should probably be a feature of Spark itself; however, I don't think the 
existing conversion infrastructure is flexible enough to handle it. I'll put up 
a draft PR exploring the idea to see if there is interest!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to