(sedona) 09/09: [DOCS] add geopackage docs (#1835)

jiayu Fri, 28 Feb 2025 09:45:39 -0800

This is an automated email from the ASF dual-hosted git repository.

jiayu pushed a commit to branch branch-1.7.0
in repository https://gitbox.apache.org/repos/asf/sedona.git


commit 53b5d410d180ebd7426c9f96d36e4c73c8f02184
Author: Matthew Powers <[email protected]>
AuthorDate: Thu Feb 27 16:44:37 2025 -0500

    [DOCS] add geopackage docs (#1835)
---
 docs/tutorial/files/geopackage-sedona-spark.md | 198 +++++++++++++++++++++++++
 docs/tutorial/sql.md                           |  64 +-------
 mkdocs.yml                                     |   1 +
 3 files changed, 201 insertions(+), 62 deletions(-)

diff --git a/docs/tutorial/files/geopackage-sedona-spark.md 
b/docs/tutorial/files/geopackage-sedona-spark.md
new file mode 100644
index 0000000000..aeeb94c5c0
--- /dev/null
+++ b/docs/tutorial/files/geopackage-sedona-spark.md
@@ -0,0 +1,198 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# Apache Sedona GeoPackage with Spark
+
+This page shows how to read GeoPackage files with Apache Sedona and Spark.
+
+You’ll learn about the advantages and disadvantages of the GeoPackage file 
format and how to use them in production settings.
+
+Let’s start by creating a GeoPackage file and then reading it.
+
+## Reading a GeoPackage file with Sedona and Spark
+
+Let’s create a GeoPackage file with a few rows of data.
+
+Start by creating a GeoPandas DataFrame:
+
+```python
+point1 = Point(0, 0)
+point2 = Point(1, 1)
+polygon1 = Polygon([(5, 5), (6, 6), (7, 5), (6, 4)])
+
+data = {
+    "name": ["Point A", "Point B", "Polygon A"],
+    "value": [10, 20, 30],
+    "geometry": [point1, point2, polygon1],
+}
+gdf = gpd.GeoDataFrame(data, geometry="geometry")
+```
+
+Now write the GeoPandas DataFrame to a GeoPackage file:
+
+```python
+gdf.to_file("/tmp/my_file.gpkg", layer="my_layer", driver="GPKG")
+```
+
+GeoPandas knows to write this to a GeoPackage file because the code sets the 
driver to `GPKG`.
+
+You can think of the layer as the table name.
+
+Now let’s read the GeoPackage file Apache Sedona and Spark:
+
+```python
+df = (
+    sedona.read.format("geopackage")
+    .option("tableName", "my_layer")
+    .load("/tmp/my_file.gpkg")
+)
+df.show()
+```
+
+Here are the contents of the DataFrame:
+
+```
++---+--------------------+---------+-----+
+|fid|                geom|     name|value|
++---+--------------------+---------+-----+
+|  1|         POINT (0 0)|  Point A|   10|
+|  2|         POINT (1 1)|  Point B|   20|
+|  3|POLYGON ((5 5, 6 ...|Polygon A|   30|
++---+--------------------+---------+-----+
+```
+
+The geometry column can contain many different geometric objects like points, 
polygons, and many more.
+
+You can also see the metadata of the GeoPackage file:
+
+```python
+df = (
+    sedona.read.format("geopackage")
+    .option("showMetadata", "true")
+    .load("/tmp/my_file.gpkg")
+)
+df.show()
+```
+
+Here are the contents:
+
+```
++----------+---------+----------+-----------+--------------------+-----+-----+-----+-----+------+
+|table_name|data_type|identifier|description|         
last_change|min_x|min_y|max_x|max_y|srs_id|
++----------+---------+----------+-----------+--------------------+-----+-----+-----+-----+------+
+|  my_layer| features|  my_layer|           |2025-02-25 06:28:...|  0.0|  0.0| 
 7.0|  6.0| 99999|
++----------+---------+----------+-----------+--------------------+-----+-----+-----+-----+------+
+```
+
+## Reading many GeoPackage files with Sedona and Spark
+
+You can also read many GeoPackage files with Sedona.  Suppose you have the 
following GeoPackage files:
+
+```
+gpkgs/
+  my_file1.gpkg
+  my_file2.gpkg
+```
+
+Here’s how you can read all the files:
+
+```python
+df = (
+    sedona.read.format("geopackage")
+    .option("tableName", "my_layer")
+    .load("/tmp/gpkgs")
+)
+df.show()
+```
+
+Here are the results:
+
+```
++---+--------------------+---------+-----+
+|fid|                geom|     name|value|
++---+--------------------+---------+-----+
+|  1|         POINT (5 5)|  Point C|   30|
+|  2|POLYGON ((5 5, 6 ...|Polygon A|   40|
+|  1|         POINT (0 0)|  Point A|   10|
+|  2|         POINT (1 1)|  Point B|   20|
++---+--------------------+---------+-----+
+```
+
+You just need to supply the directory containing the GeoPackage files, and 
Sedona can read all of them into a DataFrame.
+
+Sedona is an excellent option for analyzing many GeoPackage files because it 
can read and process them in parallel.
+
+## Load raster data stored in GeoPackage files
+
+You can also load data from raster tables in the GeoPackage file. To load 
raster data, you can use the following code.
+
+```python
+df = sedona.read.format("geopackage").option("tableName", 
"raster_table").load("/path/to/geopackage")
+```
+
+Here are the contents of the DataFrame:
+
+```
++---+----------+-----------+--------+--------------------+
+| id|zoom_level|tile_column|tile_row|           tile_data|
++---+----------+-----------+--------+--------------------+
+|  1|        11|        428|     778|GridCoverage2D["c...|
+|  2|        11|        429|     778|GridCoverage2D["c...|
+|  3|        11|        428|     779|GridCoverage2D["c...|
+|  4|        11|        429|     779|GridCoverage2D["c...|
+|  5|        11|        427|     777|GridCoverage2D["c...|
++---+----------+-----------+--------+--------------------+
+```
+
+Known limitations (v1.7.0):
+
+* webp rasters are not supported
+* ewkb geometries are not supported
+* filtering based on geometries envelopes are not supported
+
+All points above should be resolved soon; stay tuned!
+
+## Advantages of the GeoPackage file format
+
+The GeoPackage file format has many advantages:
+
+* Any engine can support GeoPackage because it’s an open format.
+* It’s mutable, unlike many other formats.
+* It saves CRS information, unlike some other formats.
+* It can store spatial and raster data.
+* It can be read by many engines like GeoPandas, Sedona, and SQLite, of course.
+
+However, the GeoPackage format also has many downsides.
+
+## Disadvantages of GeoPackage
+
+The GeoPackage file format has the following disadvantages:
+
+* It’s row-oriented, so it can’t take advantage of column pruning like 
columnar file formats.
+* It does not support multi-engine concurrency transactions.
+* SQLite transactions are supported, but building reliable transactions with 
other engines would be hard.
+* All engines do not fully support it.
+
+## Conclusion
+
+GeoPackage is a solid file format if you’re using SQLite.
+
+It’s excellent that Sedona can read GeoPackage files created by SQLite 
analyses. This allows you to read GeoPackage files in parallel and analyze 
massive datasets. You can also run Sedona on a cluster.
+
+If you don’t already use GeoPackage, you should probably use file formats like 
GeoParquet or Iceberg.
diff --git a/docs/tutorial/sql.md b/docs/tutorial/sql.md
index 828d1dd936..aa894bce5d 100644
--- a/docs/tutorial/sql.md
+++ b/docs/tutorial/sql.md
@@ -658,7 +658,7 @@ For Postgis there is no need to add a query to convert 
geometry types since it's
                .withColumn("geom", f.expr("ST_GeomFromWKB(geom)")))
        ```
 
-## Load from geopackage
+## Load from GeoPackage
 
 Since v1.7.0, Sedona supports loading Geopackage file format as a DataFrame.
 
@@ -680,67 +680,7 @@ Since v1.7.0, Sedona supports loading Geopackage file 
format as a DataFrame.
        df = sedona.read.format("geopackage").option("tableName", 
"tab").load("/path/to/geopackage")
        ```
 
-Geopackage files can contain vector data and raster data. To show the possible 
options from a file you can
-look into the metadata table by adding parameter showMetadata and set its 
value as true.
-
-=== "Scala/Java"
-
-       ```scala
-       val df = sedona.read.format("geopackage").option("showMetadata", 
"true").load("/path/to/geopackage")
-       ```
-
-=== "Java"
-
-       ```java
-       Dataset<Row> df = 
sedona.read().format("geopackage").option("showMetadata", 
"true").load("/path/to/geopackage")
-       ```
-
-=== "Python"
-
-       ```python
-       df = sedona.read.format("geopackage").option("showMetadata", 
"true").load("/path/to/geopackage")
-
-Then you can see the metadata of the geopackage file like below.
-
-```
-+--------------------+---------+--------------------+-----------+--------------------+----------+-----------------+----------+----------+------+
-|          table_name|data_type|          identifier|description|         
last_change|     min_x|            min_y|     max_x|     max_y|srs_id|
-+--------------------+---------+--------------------+-----------+--------------------+----------+-----------------+----------+----------+------+
-|gis_osm_water_a_f...| features|gis_osm_water_a_f...|           |2024-09-30 
23:07:...|-9.0257084|57.96814069999999|33.4866675|80.4291867|  4326|
-+--------------------+---------+--------------------+-----------+--------------------+----------+-----------------+----------+----------+------+
-```
-
-You can also load data from raster tables in the geopackage file. To load 
raster data, you can use the following code.
-
-=== "Scala/Java"
-
-       ```scala
-       val df = sedona.read.format("geopackage").option("tableName", 
"raster_table").load("/path/to/geopackage")
-       ```
-
-=== "Java"
-
-       ```java
-       Dataset<Row> df = 
sedona.read().format("geopackage").option("tableName", 
"raster_table").load("/path/to/geopackage")
-       ```
-
-=== "Python"
-
-       ```python
-       df = sedona.read.format("geopackage").option("tableName", 
"raster_table").load("/path/to/geopackage")
-       ```
-
-```
-+---+----------+-----------+--------+--------------------+
-| id|zoom_level|tile_column|tile_row|           tile_data|
-+---+----------+-----------+--------+--------------------+
-|  1|        11|        428|     778|GridCoverage2D["c...|
-|  2|        11|        429|     778|GridCoverage2D["c...|
-|  3|        11|        428|     779|GridCoverage2D["c...|
-|  4|        11|        429|     779|GridCoverage2D["c...|
-|  5|        11|        427|     777|GridCoverage2D["c...|
-+---+----------+-----------+--------+--------------------+
-```
+See [this page](../files/geopackage-sedona-spark) for more information on 
loading GeoPackage.
 
 Known limitations (v1.7.0):
 
diff --git a/mkdocs.yml b/mkdocs.yml
index 2b15dc683a..fca59dde38 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -62,6 +62,7 @@ nav:
           - Work with GeoPandas and Shapely: tutorial/geopandas-shapely.md
           - Files:
               - CSV: tutorial/files/csv-geometry-sedona-spark.md
+              - GeoPackage: tutorial/files/geopackage-sedona-spark.md
               - GeoParquet: tutorial/files/geoparquet-sedona-spark.md
               - GeoJSON: tutorial/files/geojson-sedona-spark.md
           - Map visualization SQL app:

(sedona) 09/09: [DOCS] add geopackage docs (#1835)

Reply via email to