MrPowers commented on code in PR #238:
URL: https://github.com/apache/sedona-db/pull/238#discussion_r2460807738


##########
docs/delta-lake.ipynb:
##########
@@ -0,0 +1,242 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "3425268d-6430-4e52-9019-969d61ef5458",
+   "metadata": {},
+   "source": [
+    "# SedonaDB + Delta Lake\n",
+    "\n",
+    "This page shows how to read and write Delta Lake tables with SedonaDB.\n",
+    "\n",
+    "Make sure you run `pip install deltalake` to run the cells in this 
notebook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "83f5ca25-8059-4624-bd00-e44cd172d9c2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from deltalake import write_deltalake, DeltaTable\n",
+    "import sedona.db\n",
+    "\n",
+    "sd = sedona.db.connect()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a90b1bf-99f9-4f50-987b-f1f30bf9988a",
+   "metadata": {},
+   "source": [
+    "Read in a GeoParquet dataset into a SedonaDB DataFrame."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "a37f3a30-3267-4bfc-8e5b-e234053927af",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "countries = sd.read_parquet(\n",
+    "    
\"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_countries_geo.parquet\"\n";,
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1f82b047-8182-4120-82d7-945ff38ecbca",
+   "metadata": {},
+   "source": [
+    "## Create a Delta Lake table\n",
+    "\n",
+    "Now write the DataFrame to a Delta Lake table.  Notice that the geometry 
column must be converted to Well-Known Text (WKT) before writing to the Delta 
table.\n",
+    "\n",
+    "Delta Lake does not support geometry columns."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "35bdc296-d9ef-4ea2-9ba5-dac1cfd115ef",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "countries.to_view(\"countries\")\n",
+    "df = sd.sql(\n",
+    "    \"select name, continent, ST_AsText(geometry) as geometry_wkt from 
countries\"\n",
+    ")\n",
+    "table_path = \"/tmp/delta_with_wkt\"\n",
+    "write_deltalake(table_path, df.to_pandas(), mode=\"overwrite\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "974a5558-fb18-49b7-a304-fa93bdd363e8",
+   "metadata": {},
+   "source": [
+    "## Read Delta table into SedonaDB\n",
+    "\n",
+    "Now read the Delta table back into a SedonaDB DataFrame."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "c15d4605-483a-4041-973a-547098bdeef4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      
"┌─────────────────────────────┬───────────────┬────────────────────────────────────────────────────┐\n",
+      "│             name            ┆   continent   ┆                    
geometry_wkt                    │\n",
+      "│             utf8            ┆      utf8     ┆                        
utf8                        │\n",
+      
"╞═════════════════════════════╪═══════════════╪════════════════════════════════════════════════════╡\n",
+      "│ Fiji                        ┆ Oceania       ┆ MULTIPOLYGON(((180 
-16.067132663642447,180 -16.55… │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ United Republic of Tanzania ┆ Africa        ┆ 
POLYGON((33.90371119710453 -0.9500000000000001,34… │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Western Sahara              ┆ Africa        ┆ 
POLYGON((-8.665589565454809 27.656425889592356,-8… │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Canada                      ┆ North America ┆ 
MULTIPOLYGON(((-122.84000000000003 49.00000000000… │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ United States of America    ┆ North America ┆ 
MULTIPOLYGON(((-122.84000000000003 49.00000000000… │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Kazakhstan                  ┆ Asia          ┆ 
POLYGON((87.35997033076265 49.21498078062912,86.5… │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Uzbekistan                  ┆ Asia          ┆ 
POLYGON((55.96819135928291 41.30864166926936,55.9… │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Papua New Guinea            ┆ Oceania       ┆ 
MULTIPOLYGON(((141.00021040259185 -2.600151055515… │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Indonesia                   ┆ Asia          ┆ 
MULTIPOLYGON(((141.00021040259185 -2.600151055515… │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Argentina                   ┆ South America ┆ 
MULTIPOLYGON(((-68.63401022758323 -52.63637045887… │\n",
+      
"└─────────────────────────────┴───────────────┴────────────────────────────────────────────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "dt = DeltaTable(table_path)\n",
+    "arrow_table = dt.to_pyarrow_table()\n",

Review Comment:
   Yea, for sure.  For DuckDB, the read Delta + apply filtering query pattern 
benefits from PyArrow Datasets vs PyArrow tables.  I can prepare a little 
benchmark if that'd be interesting.  
   
   I added a filtering example to the notebook to show the benefits of the 
geometry data type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to