Re: [D] Observations from R and Python benchmarks: performance bottlenecks and optimization ideas for sedona-db [sedona]

via GitHub Sat, 28 Feb 2026 20:20:35 -0800


GitHub user paleolimbot added a comment to the discussion: Observations from R 
and Python benchmarks: performance bottlenecks and optimization ideas for 
sedona-db


A few updates here!

> Would the team be open to adding a native .to_polars() method (or similar) to 
> keep geometries in efficient binary format?

Becauase a SedonaDB DataFrame implements the Arrow PyCapsule protocol, I think 
you can just do `pl.from_arrow(df)` (e.g., no intermediary `to_arrow_table()`. 
I'm not sure if polars can stream (DuckDB can) but if it does it would be able 
to avoid loading the entire table into memory at once.

```python
import duckdb
import polars as pl
import sedona.db

sd = sedona.db.connect()

url = 
"https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/ns-water_water-point.parquet";
df = sd.read_parquet(url)

# For polars
pl.from_arrow(df)

# For duckdb you can just treat 'df' like a table as of DuckDB 1.5.0
duckdb.sql("SELECT * FROM df LIMIT 5")
```

We should at least document this...I'm not personally keen on `.to_polars()` 
and `.to_duckdb()` but they are also not hard to implement.

> R: Direct File Ingestion (GDAL/OGR)

It's not quite as good as Python's `sd.read_pyogrio()`, but I did implement 
`sd_read_sf()` which takes care of many use cases.

```r
# install.packages(
#   "sedonadb",
#   repos = c("https://apache.r-universe.dev";, "https://cloud.r-project.org";)
# )

library(sedonadb)

tf <- tempfile(fileext = ".fgb")
curl::curl_download(
  
"https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/ns-water_elevation.fgb";,
  tf
)

system.time(sedonadb::sd_read_sf(tf))
#>    user  system elapsed 
#>   1.126   0.336   1.658

system.time(sf::read_sf(tf))
#>    user  system elapsed 
#>  47.515   3.244  51.532
```

> Roadmap: Complex Linestring Operations

We don't have `ST_Subdivide()` yet but we do have `ST_LineMerge()` as of the 
forthcoming release!

```python
# pip install "apache-sedona[db]" --force-reinstall --pre 
--extra-index-url=https://pypi.fury.io/sedona-nightlies/
import sedona.db

sd = sedona.db.connect()
sd.sql("SELECT ST_LineMerge(ST_GeomFromWKT('MULTILINESTRING ((0 0, 1 0), (1 0, 
1 1))')) AS g").show()
#> ┌─────────────────────────┐
#> │            g            │
#> │         geometry        │
#> ╞═════════════════════════╡
#> │ LINESTRING(0 0,1 0,1 1) │
#> └─────────────────────────┘
```

> I opened a specific issue:

I think we got this one solved!


GitHub link: 
https://github.com/apache/sedona/discussions/2576#discussioncomment-15958979

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Observations from R and Python benchmarks: performance bottlenecks and optimization ideas for sedona-db [sedona]

Reply via email to