GitHub user paleolimbot added a comment to the discussion: Observations from R and Python benchmarks: performance bottlenecks and optimization ideas for sedona-db
A few updates here! > Would the team be open to adding a native .to_polars() method (or similar) to > keep geometries in efficient binary format? Becauase a SedonaDB DataFrame implements the Arrow PyCapsule protocol, I think you can just do `pl.from_arrow(df)` (e.g., no intermediary `to_arrow_table()`. I'm not sure if polars can stream (DuckDB can) but if it does it would be able to avoid loading the entire table into memory at once. ```python import duckdb import polars as pl import sedona.db sd = sedona.db.connect() url = "https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/ns-water_water-point.parquet" df = sd.read_parquet(url) # For polars pl.from_arrow(df) # For duckdb you can just treat 'df' like a table as of DuckDB 1.5.0 duckdb.sql("SELECT * FROM df LIMIT 5") ``` We should at least document this...I'm not personally keen on `.to_polars()` and `.to_duckdb()` but they are also not hard to implement. > R: Direct File Ingestion (GDAL/OGR) It's not quite as good as Python's `sd.read_pyogrio()`, but I did implement `sd_read_sf()` which takes care of many use cases. ```r # install.packages( # "sedonadb", # repos = c("https://apache.r-universe.dev", "https://cloud.r-project.org") # ) library(sedonadb) tf <- tempfile(fileext = ".fgb") curl::curl_download( "https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/ns-water_elevation.fgb", tf ) system.time(sedonadb::sd_read_sf(tf)) #> user system elapsed #> 1.126 0.336 1.658 system.time(sf::read_sf(tf)) #> user system elapsed #> 47.515 3.244 51.532 ``` > Roadmap: Complex Linestring Operations We don't have `ST_Subdivide()` yet but we do have `ST_LineMerge()` as of the forthcoming release! ```python # pip install "apache-sedona[db]" --force-reinstall --pre --extra-index-url=https://pypi.fury.io/sedona-nightlies/ import sedona.db sd = sedona.db.connect() sd.sql("SELECT ST_LineMerge(ST_GeomFromWKT('MULTILINESTRING ((0 0, 1 0), (1 0, 1 1))')) AS g").show() #> ┌─────────────────────────┐ #> │ g │ #> │ geometry │ #> ╞═════════════════════════╡ #> │ LINESTRING(0 0,1 0,1 1) │ #> └─────────────────────────┘ ``` > I opened a specific issue: I think we got this one solved! GitHub link: https://github.com/apache/sedona/discussions/2576#discussioncomment-15958979 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
