This is an automated email from the ASF dual-hosted git repository. jiayu pushed a commit to branch docs-mar-07 in repository https://gitbox.apache.org/repos/asf/sedona.git
commit f2f1132cc06abf004ad459bc12b49cce9bb11202 Author: Jia Yu <[email protected]> AuthorDate: Mon Mar 10 00:04:27 2025 -0700 Fix multiple typos and structure --- docs/tutorial/files/stac-sedona-spark.md | 32 ++++++++++------------- docs/tutorial/sql.md | 44 ++++++++++++++++++++++---------- 2 files changed, 44 insertions(+), 32 deletions(-) diff --git a/docs/tutorial/files/stac-sedona-spark.md b/docs/tutorial/files/stac-sedona-spark.md index 09bcba2328..062e6c5f55 100644 --- a/docs/tutorial/files/stac-sedona-spark.md +++ b/docs/tutorial/files/stac-sedona-spark.md @@ -274,10 +274,11 @@ These examples demonstrate how to use the Client class to search for items in a Opens a connection to the specified STAC API URL. Parameters: -* `url` (*str*): The URL of the STAC API to connect to. - * Example: `"https://planetarycomputer.microsoft.com/api/stac/v1"` + +* `url` (*str*): The URL of the STAC API to connect to. Example: `"https://planetarycomputer.microsoft.com/api/stac/v1"` Returns: + * `Client`: An instance of the `Client` class connected to the specified URL. --- @@ -286,10 +287,11 @@ Returns: Retrieves a collection client for the specified collection ID. Parameters: -* `collection_id` (*str*): The ID of the collection to retrieve. - * Example: `"aster-l1t"` + +* `collection_id` (*str*): The ID of the collection to retrieve. Example: `"aster-l1t"` Returns: + * `CollectionClient`: An instance of the `CollectionClient` class for the specified collection. --- @@ -299,23 +301,15 @@ Searches for items in the specified collection with optional filters. Parameters: -* `ids` (*Union[str, list]*): A variable number of item IDs to filter the items. - * Example: `"item_id1"` or `["item_id1", "item_id2"]` -* `collection_id` (*str*): The ID of the collection to search in. - * Example: `"aster-l1t"` -* `bbox` (*Optional[list]*): A list of bounding boxes for filtering the items. Each bounding box is represented as a list of four float values: `[min_lon, min_lat, max_lon, max_lat]`. - * Example: `[[ -180.0, -90.0, 180.0, 90.0 ]]` -* `datetime` (*Optional[Union[str, datetime.datetime, list]]*): A single datetime, RFC 3339-compliant timestamp, or a list of date-time ranges for filtering the items. - * Examples: - * `"2020-01-01T00:00:00Z"` - * `datetime.datetime(2020, 1, 1)` - * `[["2020-01-01T00:00:00Z", "2021-01-01T00:00:00Z"]]` -* `max_items` (*Optional[int]*): The maximum number of items to return from the search, even if there are more matching results. - * Example: `100` -* `return_dataframe` (*bool*): If `True` (default), return the result as a Spark DataFrame instead of an iterator of `PyStacItem` objects. - * Example: `True` +* `ids` (*Union[str, list]*): A variable number of item IDs to filter the items. Example: `"item_id1"` or `["item_id1", "item_id2"]` +* `collection_id` (*str*): The ID of the collection to search in. Example: `"aster-l1t"` +* `bbox` (*Optional[list]*): A list of bounding boxes for filtering the items, represented as `[min_lon, min_lat, max_lon, max_lat]`. Example: `[[ -180.0, -90.0, 180.0, 90.0 ]]` +* `datetime` (*Optional[Union[str, datetime.datetime, list]]*): A single datetime, RFC 3339-compliant timestamp, or a list of date-time ranges. Example: `"2020-01-01T00:00:00Z"`, `datetime.datetime(2020, 1, 1)`, `[["2020-01-01T00:00:00Z", "2021-01-01T00:00:00Z"]]` +* `max_items` (*Optional[int]*): The maximum number of items to return. Example: `100` +* `return_dataframe` (*bool*): If `True` (default), return the result as a Spark DataFrame instead of an iterator of `PyStacItem` objects. Example: `True` Returns: + * *Union[Iterator[PyStacItem], DataFrame]*: An iterator of `PyStacItem` objects or a Spark DataFrame that matches the specified filters. ## References diff --git a/docs/tutorial/sql.md b/docs/tutorial/sql.md index fecbf3ef82..2564e39c25 100644 --- a/docs/tutorial/sql.md +++ b/docs/tutorial/sql.md @@ -139,7 +139,7 @@ Add the following line after creating Sedona config. If you already have a Spark You can also register everything by passing `--conf spark.sql.extensions=org.apache.sedona.sql.SedonaSqlExtensions` to `spark-submit` or `spark-shell`. -## Load data from files +## Load data from text files Assume we have a WKT file, namely `usa-county.tsv`, at Path `/Download/usa-county.tsv` as follows: @@ -152,6 +152,8 @@ POLYGON (..., ...) Lancaster County The file may have many other columns. +### Load the raw DataFrame + Use the following code to load the data and create a raw DataFrame: === "Scala" @@ -186,7 +188,7 @@ The output will be like this: |POLYGON ((-96.910...| 31|109|00835876|31109| Lancaster| Lancaster County| 06| H1|G4020| 339|30700|null| A|2169240202|22877180|+40.7835474|-096.6886584| ``` -## Create a Geometry type column +### Create a Geometry type column All geometrical operations in SedonaSQL are on Geometry type objects. Therefore, before any kind of queries, you need to create a Geometry type column on a DataFrame. @@ -347,6 +349,30 @@ Please refer to [Reading Legacy Parquet Files](../api/sql/Reading-legacy-parquet See [this page](files/geoparquet-sedona-spark.md) for more information on loading GeoParquet. +## Load data from STAC catalog + +Sedona STAC data source allows you to read data from a SpatioTemporal Asset Catalog (STAC) API. The data source supports reading STAC items and collections. + +You can load a STAC collection from a s3 collection file object: + +```python +df = sedona.read.format("stac").load( + "s3a://example.com/stac_bucket/stac_collection.json" +) +``` + +You can also load a STAC collection from an HTTP/HTTPS endpoint: + +```python +df = sedona.read.format("stac").load( + "https://earth-search.aws.element84.com/v1/collections/sentinel-2-pre-c1-l2a" +) +``` + +The STAC data source supports predicate pushdown for spatial and temporal filters. The data source can push down spatial and temporal filters to the underlying data source to reduce the amount of data that needs to be read. + +See [this page](files/stac-sedona-spark.md) for more information on loading data from STAC. + ## Load data from JDBC data sources The 'query' option in Spark SQL's JDBC data source can be used to convert geometry columns to a format that Sedona can interpret. @@ -407,7 +433,7 @@ For Postgis there is no need to add a query to convert geometry types since it's .withColumn("geom", f.expr("ST_GeomFromWKB(geom)"))) ``` -## Load from GeoPackage +## Load GeoPackage Since v1.7.0, Sedona supports loading Geopackage file format as a DataFrame. @@ -431,7 +457,7 @@ Since v1.7.0, Sedona supports loading Geopackage file format as a DataFrame. See [this page](files/geopackage-sedona-spark.md) for more information on loading GeoPackage. -## Load from OSM PBF +## Load OSM PBF Since v1.7.1, Sedona supports loading OSM PBF file format as a DataFrame. @@ -527,14 +553,6 @@ and for relation +-----+--------+--------+--------------------+--------------------+--------------------+--------------------+ ``` -Known limitations (v1.7.0): - -- webp rasters are not supported -- ewkb geometries are not supported -- filtering based on geometries envelopes are not supported - -All points above should be resolved soon, stay tuned ! - ## Transform the Coordinate Reference System Sedona doesn't control the coordinate unit (degree-based or meter-based) of all geometries in a Geometry column. The unit of all related distances in SedonaSQL is same as the unit of all geometries in a Geometry column. @@ -1188,7 +1206,7 @@ SELECT ST_AsText(countyshape) FROM polygondf ``` -## Save as GeoJSON +## Save GeoJSON Since `v1.6.1`, the GeoJSON data source in Sedona can be used to save a Spatial DataFrame to a single-line JSON file, with geometries written in GeoJSON format.
