paleolimbot opened a new pull request, #642:
URL: https://github.com/apache/sedona-db/pull/642
This PR adds support for geometry columns in `to_parquet(...,
sort_by="...")`. This is currently a special case for the Python to_parquet
function...we could insert an optimizer rule that does this so that we can
support ORDER BY geometry in SQL, too; however, at the point we need SQL the
workaround is not too bad. This at least allows writing mostly optimized
GeoParquet 1.1 without SQL.
```python
import geopandas
import sedona.db
sd = sedona.db.connect()
sd.funcs.table.sd_random_geometry(
"Point", 10000, seed=948, bounds=[-50, -50, 50, 50]
).to_view("pts", overwrite=True)
df = sd.sql("SELECT id, ST_SetSRID(geometry, 4326) AS geometry FROM pts")
df.to_parquet(
"sorted.parquet",
sort_by="geometry",
geoparquet_version="1.1",
max_row_group_size=100000,
)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]