Re: [PR] docs: add sedonadb programming guide [sedona-db]

via GitHub Thu, 11 Sep 2025 23:53:11 -0700


paleolimbot commented on code in PR #64:
URL: https://github.com/apache/sedona-db/pull/64#discussion_r2342140956



##########
docs/programming-guide.md:
##########


Review Comment:
   Can you include this as the `.ipynb` instead of a `.md` file? (Then it is 
easier to maintain and check in CI).



##########
docs/programming-guide.md:
##########
@@ -0,0 +1,239 @@
+# SedonaDB Guide
+
+This page explains how to process vector data with SedonaDB.
+
+You will learn how to create SedonaDB DataFrames, run spatial queries, and 
perform I/O operations with various types of files.
+
+Let’s start by establishing a SedonaDB connection.
+
+## Establish SedonaDB connection
+
+Here’s how to create the SedonaDB connection:
+
+```python
+import sedona.db
+
+sd = sedona.db.connect()
+```
+
+Now let’s see how to create SedonaDB DataFrames.
+
+## Create SedonaDB DataFrame
+
+**Manually creating SedonaDB DataFrame**
+
+Here’s how to manually create a SedonaDB DataFrame:
+
+```python
+df = sd.sql("""
+SELECT * FROM (VALUES
+    ('one', ST_GeomFromWkt('POINT(1 2)')),
+    ('two', ST_GeomFromWkt('POLYGON((-74.0 40.7, -74.0 40.8, -73.9 40.8, -73.9 
40.7, -74.0 40.7))')),
+    ('three', ST_GeomFromWkt('LINESTRING(-74.0060 40.7128, -73.9352 40.7306, 
-73.8561 40.8484)')))
+AS t(val, point)""")
+```
+
+Check the contents of the DataFrame:
+
+```python
+df.show()
+```
+
+```
+┌───────┬───────────────────────────────────────────────────────────────┐
+│  val  ┆                             point                             │
+│  utf8 ┆                              wkb                              │
+╞═══════╪═══════════════════════════════════════════════════════════════╡
+│ one   ┆ POINT(1 2)                                                    │
+├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+│ two   ┆ POLYGON((-74 40.7,-74 40.8,-73.9 40.8,-73.9 40.7,-74 40.7))   │
+├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+│ three ┆ LINESTRING(-74.006 40.7128,-73.9352 40.7306,-73.8561 40.8484) │
+└───────┴───────────────────────────────────────────────────────────────┘
+```
+
+Check the type of the DataFrame.
+
+```python
+type(df)
+
+sedonadb.dataframe.DataFrame
+```
+
+**Create SedonaDB DataFrame from files in S3**
+
+For most production applications, you will create SedonaDB DataFrames by 
reading data from a file.  Let’s see how to read GeoParquet files in AWS S3 
into a SedonaDB DataFrame.
+
+Import the required libraries and set environment variables:
+
+```python
+import os
+
+os.environ["AWS_SKIP_SIGNATURE"] = "true"
+os.environ["AWS_DEFAULT_REGION"] = "us-west-2"

Review Comment:
   This can now be inlined into the `read_parquet()` call: `read_parquet(..., 
options = {"aws.skip_signature": True, "aws.default_region": "us-west-2")`. 



##########
docs/programming-guide.md:
##########
@@ -0,0 +1,239 @@
+# SedonaDB Guide
+
+This page explains how to process vector data with SedonaDB.
+
+You will learn how to create SedonaDB DataFrames, run spatial queries, and 
perform I/O operations with various types of files.
+
+Let’s start by establishing a SedonaDB connection.
+
+## Establish SedonaDB connection
+
+Here’s how to create the SedonaDB connection:
+
+```python
+import sedona.db
+
+sd = sedona.db.connect()
+```
+
+Now let’s see how to create SedonaDB DataFrames.
+
+## Create SedonaDB DataFrame
+
+**Manually creating SedonaDB DataFrame**
+
+Here’s how to manually create a SedonaDB DataFrame:
+
+```python
+df = sd.sql("""
+SELECT * FROM (VALUES
+    ('one', ST_GeomFromWkt('POINT(1 2)')),
+    ('two', ST_GeomFromWkt('POLYGON((-74.0 40.7, -74.0 40.8, -73.9 40.8, -73.9 
40.7, -74.0 40.7))')),
+    ('three', ST_GeomFromWkt('LINESTRING(-74.0060 40.7128, -73.9352 40.7306, 
-73.8561 40.8484)')))
+AS t(val, point)""")
+```
+
+Check the contents of the DataFrame:
+
+```python
+df.show()
+```
+
+```
+┌───────┬───────────────────────────────────────────────────────────────┐
+│  val  ┆                             point                             │
+│  utf8 ┆                              wkb                              │
+╞═══════╪═══════════════════════════════════════════════════════════════╡
+│ one   ┆ POINT(1 2)                                                    │
+├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+│ two   ┆ POLYGON((-74 40.7,-74 40.8,-73.9 40.8,-73.9 40.7,-74 40.7))   │
+├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+│ three ┆ LINESTRING(-74.006 40.7128,-73.9352 40.7306,-73.8561 40.8484) │
+└───────┴───────────────────────────────────────────────────────────────┘
+```
+
+Check the type of the DataFrame.
+
+```python
+type(df)
+
+sedonadb.dataframe.DataFrame
+```
+
+**Create SedonaDB DataFrame from files in S3**
+
+For most production applications, you will create SedonaDB DataFrames by 
reading data from a file.  Let’s see how to read GeoParquet files in AWS S3 
into a SedonaDB DataFrame.
+
+Import the required libraries and set environment variables:
+
+```python
+import os
+
+os.environ["AWS_SKIP_SIGNATURE"] = "true"
+os.environ["AWS_DEFAULT_REGION"] = "us-west-2"
+```
+
+Read in the Overture divisions table into a SedonaDB DataFrame and create a 
view:
+
+```python
+sd.read_parquet(
+    
"s3://overturemaps-us-west-2/release/2025-08-20.0/theme=divisions/type=division_area/"
+).to_view("division_area")
+```
+
+Let’s now run some spatial queries.
+
+**Read from GeoPandas DataFrame**
+
+This section shows how to convert a GeoPandas DataFrame into a SedonaDB 
DataFrame.
+
+Start by reading a FlatGeoBuf file into a GeoPandas DataFrame:
+
+```python
+path = 
"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.fgb";
+gdf = gpd.read_file(path)
+```
+
+Now convert the GeoPandas DataFrame to a SedonaDB DataFrame and view three 
rows of content:
+
+```
+df = sd.create_data_frame(gdf)
+df.show(3)
+
+┌──────────────┬──────────────────────────────┐
+│     name     ┆           geometry           │
+│     utf8     ┆           geometry           │
+╞══════════════╪══════════════════════════════╡
+│ Vatican City ┆ POINT(12.4533865 41.9032822) │
+├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+│ San Marino   ┆ POINT(12.4417702 43.9360958) │
+├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+│ Vaduz        ┆ POINT(9.5166695 47.1337238)  │
+└──────────────┴──────────────────────────────┘
+```
+
+## Spatial queries
+
+Let’s see how to run spatial operations like filtering, joins, and clustering 
algorithms.
+
+***Spatial filtering***
+
+Let’s run a spatial filtering operation to fetch all the objects in the 
following polygon:
+
+```python
+nova_scotia_bbox_wkt = "POLYGON((-66.5 43.4, -66.5 47.1, -59.8 47.1, -59.8 
43.4, -66.5 43.4))"
+
+ns = sd.sql(f"""
+SELECT country, region, names, geometry FROM division_area
+WHERE ST_Intersects(geometry, 
ST_SetSRID(ST_GeomFromText('{nova_scotia_bbox_wkt}'), 4326))
+""")
+```
+
+Take a look at the data contained in the `ns` DataFrame:
+
+```
+ns.show(3)
+
+┌──────────┬──────────┬──────────────────────────────────────┬─────────────────────────────────────┐
+│  country ┆  region  ┆                 names                ┆               
geometry              │
+│ utf8view ┆ utf8view ┆ struct(primary utf8, common map(fie… ┆               
geometry              │
+╞══════════╪══════════╪══════════════════════════════════════╪═════════════════════════════════════╡
+│ CA       ┆          ┆ {primary: Canada, common: {hy: Կան… ┆ 
MULTIPOLYGON(((-117.8317675 49.000… │
+├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+│ CA       ┆          ┆ {primary: Canada, common: {hy: Կան… ┆ 
MULTIPOLYGON(((-59.7502166 44.2338… │
+├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+│ CA       ┆ CA-NS    ┆ {primary: Seal Island, common: , ru… ┆ 
POLYGON((-66.0528452 43.4531336,-6… │
+└──────────┴──────────┴──────────────────────────────────────┴─────────────────────────────────────┘
+```
+
+You can see it only includes the divisions in the Nova Scotia area.  Skip to 
the visualization section to see how this data can be graphed on a map.
+
+***K-nearest neighbors (KNN) joins***
+
+Create `restaurants` and `customers` tables so we can demonstrate the KNN join 
functionality.
+
+```sql
+CREATE table restaurants AS (
+    SELECT * FROM (VALUES
+        (101, ST_Point(-74.01, 40.71), 'Pizza Palace'),
+        (102, ST_Point(-73.99, 40.69), 'Burger Barn'),
+        (103, ST_Point(-74.02, 40.72), 'Taco Town'),
+        (104, ST_Point(-73.98, 40.75), 'Sushi Spot'),
+        (105, ST_Point(-74.05, 40.68), 'Deli Direct')
+    ) AS t(id, location, name)
+)

Review Comment:
   I don't think this will pass with the current version of SedonaDB because of 
the DataFusion issue with VALUES. You should be able to use `SELECT location, 
ST_Point(x, y) FROM (VALUES (1, -74.0, 40.7, 'Alice'), ...) AS t(id, lng, lat, 
name)` to work around this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] docs: add sedonadb programming guide [sedona-db]

Reply via email to