Re: [PR] [DOCS] Draft SedonaDB 0.2.0 release post [sedona]

via GitHub Thu, 04 Dec 2025 15:53:54 -0800


jiayuasu commented on code in PR #2540:
URL: https://github.com/apache/sedona/pull/2540#discussion_r2590968605



##########
docs/blog/posts/intro-sedonadb-0-2.md:
##########
@@ -0,0 +1,250 @@
+---
+date:
+  created: 2025-12-01
+links:
+  - SedonaDB: https://sedona.apache.org/sedonadb/
+authors:
+  - dewey
+  - kristin
+  - feng
+  - peter
+  - jess
+  - jia
+  - matt_powers
+title: "SedonaDB 0.2.0 Release"
+---
+
+<!--
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+-->
+
+The Apache Sedona community is excited to announce the release of 
[SedonaDB](https://sedona.apache.org/sedonadb) version 0.2.0!
+
+SedonaDB is the first open-source, single-node analytical database engine that 
treats spatial data as a first-class citizen. It is developed as a subproject 
of Apache Sedona. This release consists of [136 resolved 
issues](https://github.com/apache/sedona-db/milestone/1?closed=1) including 40 
new functions from 17 contributors.
+
+Apache Sedona powers large-scale geospatial processing on distributed engines 
like Spark (SedonaSpark), Flink (SedonaFlink), and Snowflake (SedonaSnow). 
SedonaDB extends the Sedona ecosystem with a single-node engine optimized for 
small-to-medium data analytics, delivering the simplicity and speed that 
distributed systems often cannot.
+
+<!-- more -->
+
+## Release Highlights
+
+- Improved spatial function coverage
+- GDAL/OGR spatial file format read support
+- GeoParquet 1.1 write support
+- Python user-defined function support
+- Initial Raster data type implementation
+- Release on `crates.io`
+- Build system improvements
+
+For a complete list of changes since SedonaDB 0.1.0 see the [milestone for 
0.2.0](https://github.com/apache/sedona-db/milestone/1?closed=1).
+
+## Improved spatial function coverage
+
+Since the 0.1.0 release we have been fortunate to work with contributors add 
40 new `ST_` and `RS_` functions to our growing catalogue. Users of rs_height, 
rs_scalex, rs_scaley, rs_skewx, rs_skewy, rs_upperleftx, rs_upperlefty, 
rs_width, st_azimuth, st_boundary, st_crosses, st_dump, st_endpoint, 
st_geometryfromtext, st_geometryn, st_isclosed, st_iscollection, st_isring, 
st_issimple, st_isvalid, st_isvalidreason, st_makevalid, st_minimumclearance, 
st_minimumclearanceline, st_npoints, st_numgeometries, st_overlaps, st_pointn, 
st_points, st_polygonize, st_polygonize_agg, st_reverse, st_simplify, 
st_simplifypreservetopology, st_snap, st_startpoint, st_translate, 
st_unaryunion, and st_zmflag will be pleased to know that these functions are 
now available in SedonaDB workflows.
+
+Thank you to [Abeeujah](https://github.com/Abeeujah), 
[ayushjariyal](https://github.com/ayushjariyal), 
[jesspav](https://github.com/jesspav), 
[joonaspessi](https://github.com/joonaspessi), 
[petern48](https://github.com/petern48), and 
[yutannihilation](https://github.com/yutannihilation) for these contributions! 
(With a special thanks to [petern48](https://github.com/petern48) for reviewing 
nearly all of them!)
+
+## GDAL/OGR spatial file format read support
+
+Whereas SedonaDB 0.1.0 launched with GeoParquet read support and GeoPandas 
interoperatiblity, support for file formats like GeoPackage, Shapefile, 
FlatGeoBuf inherited the limitations of GeoPandas (notably, the materialization 
of an entire layer in memory as a Pandas dataframe). The package powering 
GeoPandas read support ([pyogrio](https://github.com/geopandas/pyogrio)) also 
exposes the [underlying provider's (GDAL/OGR) native Arrow 
interface](https://gdal.org/en/stable/development/rfc/rfc86_column_oriented_api.html),
 which is the exact format that SedonaDB uses under the hood! This allowed us 
to add support for dozens of vector formats at once wired directly in to 
DataFusion's flexible `FileFormat` API. Users can now read from spatial file 
formats just as they can from Parquet:
+
+```python
+# pip install "apache-sedona[db]"
+import sedona.db
+
+sd = sedona.db.connect()
+url = 
"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/example-crs/files/example-crs_vermont-utm.fgb";
+sd.read_pyogrio(url).to_pandas().plot()
+```
+
+This works for local files, `https://` urls (via `/vsicurl/`), and zipped 
files (via `/vsizip/`). Globs (i.e., `sd.read_pyogrio("path/to/*.gpkg")`) 
resolving to 1 or more local files are also supported.
+
+Like SedonaDB's GeoParquet support, spatial filters like `ST_Intersects()` are 
translated into the GDAL/OGR scan where possible to take advantage of embedded 
spatial indexes in formats like GeoPackage, FlatGeoBuf, and Shapefile. For 
example, we can query small areas of a [huge FlatGeoBuf file hosted 
elsewhere](https://flatgeobuf.org/examples/maplibre/large.html) without 
scanning the entire file:
+
+```python
+# 12 GB file
+url = "https://flatgeobuf.septima.dk/population_areas.fgb";
+sd.read_pyogrio(url).to_view("population_areas")
+
+wkt = "POLYGON ((-73.978329 40.767412, -73.950005 40.767412, -73.950005 
40.795098, -73.978329 40.795098, -73.978329 40.767412))"
+sd.sql(
+    f"""
+SELECT sum(population::INTEGER) FROM population_areas
+WHERE ST_Intersects(wkb_geometry, ST_SetSRID(ST_GeomFromWKT('{wkt}'), 4326))
+"""
+).show()
+# > ┌──────────────────────────────────┐
+# > │ sum(population_areas.population) │
+# > │               int64              │
+# > ╞══════════════════════════════════╡
+# > │                           256251 │
+# > └──────────────────────────────────┘
+```
+
+## GeoParquet 1.1 Write Support
+
+Whereas the initial version of SedonaDB launched with basic write support for 
GeoParquet files, the latest version of the specification that enables readers 
to read small portions of the resulting Parquet file was not supported. With 
the latest release, `DataFrame.to_parquet("path/to/parquet", 
geoparquet_version="1.1")` will add a `bbox` column enabling functions like 
`sd.read_parquet()` with a `WHERE ST_Intersects()` query to read only a portion 
of the input file.
+
+```python
+# pip install "apache-sedona[db]"
+import sedona.db
+
+sd = sedona.db.connect()
+url = 
"https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/ns-water_water-point.fgb";
+
+sd.read_pyogrio(url).to_parquet("water_point.parquet", 
geoparquet_version="1.1")
+```
+
+Shrinking the default row group size to ~100,000 and applying a spatial sort 
may improve read performance by ensuring that row groups contain related 
features.
+
+```python
+sd.sql("SET datafusion.execution.parquet.max_row_group_size = 100000")
+
+sd.read_parquet(url).to_view("water_point")
+
+sd.sql(
+    """
+SELECT * FROM water_point
+ORDER BY sd_order(geometry)
+"""
+).to_parquet("water_point.parquet", geoparquet_version="1.1")
+```
+
+## Python User-Defined Function Support
+
+User-defined functions (UDFs) are essential components of many workflows in 
modern DataFrame engines like Spark, DataFusion, and DuckDB to capture 
user-specific logic that is difficult or impossible to implement by simply 
composing existing functions. Just as [SedonaSpark provides a vectorized UDF 
framework for geometry 
types](https://sedona.apache.org/latest/tutorial/sql/#spatial-vectorized-udfs-python-only),
 SedonaDB 0.2.0 exposes a framework that allows user-specific logic (including 
but not limited to those involving geometry!) to be referenced from SQL 
workflows. For example, a UDF implementation of `ST_Buffer()` could be written 
as:
+
+```python
+import pyarrow as pa
+import sedona.db
+from sedonadb import udf
+import shapely
+import geoarrow.pyarrow as ga
+
+sd = sedona.db.connect()
+
+
[email protected]_udf(ga.wkb(), [udf.GEOMETRY, udf.NUMERIC])
+def shapely_udf(geom, distance):
+    geom_wkb = pa.array(geom.storage.to_array())
+    distance = pa.array(distance.to_array())
+    geom = shapely.from_wkb(geom_wkb)
+    result_shapely = shapely.buffer(geom, distance)
+    return pa.array(shapely.to_wkb(result_shapely))
+
+
+sd.register_udf(shapely_udf)
+sd.sql("SELECT shapely_udf(ST_Point(0, 0), 2.0) as col").show()
+# > 
┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
+# > │                                                col                       
                        │
+# > │                                             geometry                     
                        │
+# > 
╞══════════════════════════════════════════════════════════════════════════════════════════════════╡
+# > │ POLYGON((2 0,1.9615705608064609 -0.3901806440322565,1.8477590650225735 
-0.7653668647301796,1.66… │
+# > 
└──────────────────────────────────────────────────────────────────────────────────────────────────┘
+```
+
+See the [documentation page for 
`arrow_udf()`](https://sedona.apache.org/sedonadb/latest/reference/python/#sedonadb.udf.arrow_udf)
 for more examples and documentation.
+
+## Initial Raster data type implementation
+
+The [raster data type support in Sedona 
Spark](https://sedona.apache.org/latest/tutorial/raster) is a popular feature 
and in SedonaDB 0.2.0 we are pleased to offer a raster data type and a few 
basic functions!
+
+```python
+import sedona.db
+
+sd = sedona.db.connect()
+
+sd.sql("SELECT RS_Width(RS_Example()) as width").show()
+# > ┌────────┐
+# > │  width │
+# > │ uint64 │
+# > ╞════════╡
+# > │     64 │
+# > └────────┘
+```
+
+For more information or to get involved see the [raster support umbrella 
issue](https://github.com/apache/sedona-db/issues/246). Thank you to 
[jesspav](https://github.com/jesspav) for driving this work!
+
+## Release on `crates.io`
+
+Because our initial development of SedonaDB was closely tied to improvements 
we were experimenting with in some of our dependency crates from the GeoRust 
ecosystem, our first release of SedonaDB contained git dependencies and 
references to the forks we had used in our experiments. While the first release 
*could* be used in Rust projects via a git dependency, this prevented any 
downstream project from releasing to crates.io themselves and did not make 
clear that we do expose a public Rust API that can be used alongside any 
DataFusion-based project! Rust projects can use the components we provide or 
use the pre-assembled `SedonaContext`.
+
+SedonaDB 0.2.0, in addition to being [released to 
crates.io](https://crates.io/crates/sedona) includes [a Rust 
example](https://github.com/apache/sedona-db/tree/main/examples/sedonadb-rust) 
to get interested Rust projects started:

Review Comment:
   should this be https://crates.io/crates/sedonadb?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [DOCS] Draft SedonaDB 0.2.0 release post [sedona]

Reply via email to