paleolimbot opened a new pull request, #642:
URL: https://github.com/apache/sedona-db/pull/642

   This PR adds support for geometry columns in `to_parquet(..., 
sort_by="...")`. This is currently a special case for the Python to_parquet 
function...we could insert an optimizer rule that does this so that we can 
support ORDER BY geometry in SQL, too; however, at the point we need SQL the 
workaround is not too bad. This at least allows writing mostly optimized 
GeoParquet 1.1 without SQL.
   
   ```python
   import geopandas
   import sedona.db
   
   sd = sedona.db.connect()
   
   sd.funcs.table.sd_random_geometry(
       "Point", 10000, seed=948, bounds=[-50, -50, 50, 50]
   ).to_view("pts", overwrite=True)
   df = sd.sql("SELECT id, ST_SetSRID(geometry, 4326) AS geometry FROM pts")
   df.to_parquet(
       "sorted.parquet",
       sort_by="geometry",
       geoparquet_version="1.1",
       max_row_group_size=100000,
   )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to