This is an automated email from the ASF dual-hosted git repository.
cloud-fan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new e648859ac6c3 [SPARK-56813][DOCS] Refine the documentation for
geospatial types and functions
e648859ac6c3 is described below
commit e648859ac6c3a9918c81cc95b61b658c6b3dca54
Author: Uros Bojanic <[email protected]>
AuthorDate: Wed May 13 08:58:33 2026 +0800
[SPARK-56813][DOCS] Refine the documentation for geospatial types and
functions
### What changes were proposed in this pull request?
Tighten the geospatial documentation:
- `sql-ref-geospatial-types.md`: update the documentation for supported ST
functions.
- `sql-ref-datatypes.md`: drop unparameterized type syntax from the SQL and
PySpark.
- `sql-ref-functions-builtin.md`: surface `st_funcs` group as **Geospatial
ST Functions**.
- `sql-migration-guide.md`: note that geospatial types and ST functions are
enabled since 4.2.
- etc.
### Why are the changes needed?
Fix gaps and accuracy for geospatial documentation.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests suffice for docs only changes.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7.
Closes #55790 from uros-db/geo-docs-refine.
Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
---
docs/sql-ref-datatypes.md | 16 ++++++++--------
docs/sql-ref-functions-builtin.md | 5 +++++
docs/sql-ref-geospatial-types.md | 30 ++++++++++++++++++++----------
sql/gen-sql-functions-docs.py | 3 ++-
4 files changed, 35 insertions(+), 19 deletions(-)
diff --git a/docs/sql-ref-datatypes.md b/docs/sql-ref-datatypes.md
index 743ad4e3abb2..0ae05d8f46be 100644
--- a/docs/sql-ref-datatypes.md
+++ b/docs/sql-ref-datatypes.md
@@ -95,8 +95,8 @@ Spark SQL and DataFrames support the following data types:
* Spatial types
Spatial objects as defined in the [OGC Simple Feature
Access](https://portal.ogc.org/files/?artifact_id=25355) specification.
- - `GeometryType`: Represents GEOMETRY values—spatial objects in a Cartesian
coordinate system. The type can be fixed to a single SRID, e.g.
`geometry(4326)`, or allow mixed SRIDs with `geometry(any)`. Default SRID when
not specified is 4326 (WGS 84).
- - `GeographyType`: Represents GEOGRAPHY values—spatial objects in a
geographic coordinate system (latitude/longitude). Edge interpolation is always
SPHERICAL. The type can be fixed to a single SRID, e.g. `geography(4326)`, or
allow mixed SRIDs with `geography(any)`. Default SRID is 4326 (WGS 84).
+ - `GeometryType`: Represents GEOMETRY values, spatial objects in a Cartesian
coordinate system. The type can be fixed to a single SRID, e.g.
`geometry(4326)`, or allow mixed SRIDs with `geometry(any)`. In SQL, `GEOMETRY`
columns must always be declared with an explicit SRID or `ANY`.
+ - `GeographyType`: Represents GEOGRAPHY values, spatial objects in a
geographic coordinate system (latitude/longitude). Edge interpolation is always
SPHERICAL. The type can be fixed to a single geographic SRID, e.g.
`geography(4326)`, or allow mixed SRIDs with `geography(any)`. In SQL,
`GEOGRAPHY` columns must always be declared with an explicit SRID or `ANY`.
For more details and built-in functions, see [Geospatial
(Geometry/Geography) types](sql-ref-geospatial-types.html).
* Complex types
@@ -143,8 +143,8 @@ from pyspark.sql.types import *
|**TimestampNTZType**|datetime.datetime|TimestampNTZType()|
|**DateType**|datetime.date|DateType()|
|**DayTimeIntervalType**|datetime.timedelta|DayTimeIntervalType()|
-|**GeometryType**|Geometry|GeometryType() or GeometryType(*srid*)|
-|**GeographyType**|Geography|GeographyType() or GeographyType(*srid*)|
+|**GeometryType**|Geometry|GeometryType(*srid*)<br/>**Note:** *srid* is
required and may be an `int` or the string `"ANY"`.|
+|**GeographyType**|Geography|GeographyType(*srid*)<br/>**Note:** *srid* is
required and may be an `int` or the string `"ANY"`.|
|**ArrayType**|list, tuple, or array|ArrayType(*elementType*,
[*containsNull*])<br/>**Note:**The default value of *containsNull* is True.|
|**MapType**|dict|MapType(*keyType*, *valueType*,
[*valueContainsNull]*)<br/>**Note:**The default value of *valueContainsNull* is
True.|
|**StructType**|list or tuple|StructType(*fields*)<br/>**Note:** *fields* is a
Seq of StructFields. Also, two fields with the same name are not allowed.|
@@ -179,8 +179,8 @@ You can access them by doing
|**TimeType**|java.time.LocalTime|TimeType|
|**YearMonthIntervalType**|java.time.Period|YearMonthIntervalType|
|**DayTimeIntervalType**|java.time.Duration|DayTimeIntervalType|
-|**GeometryType**|org.apache.spark.sql.types.Geometry|GeometryType or
GeometryType(*srid*)|
-|**GeographyType**|org.apache.spark.sql.types.Geography|GeographyType or
GeographyType(*srid*)|
+|**GeometryType**|org.apache.spark.sql.types.Geometry|GeometryType(*srid*)|
+|**GeographyType**|org.apache.spark.sql.types.Geography|GeographyType(*srid*)|
|**ArrayType**|scala.collection.Seq|ArrayType(*elementType*,
[*containsNull]*)<br/>**Note:** The default value of *containsNull* is true.|
|**MapType**|scala.collection.Map|MapType(*keyType*, *valueType*,
[*valueContainsNull]*)<br/>**Note:** The default value of *valueContainsNull*
is true.|
|**StructType**|org.apache.spark.sql.Row|StructType(*fields*)<br/>**Note:**
*fields* is a Seq of StructFields. Also, two fields with the same name are not
allowed.|
@@ -272,8 +272,8 @@ The following table shows the type names as well as aliases
used in Spark SQL pa
|**DecimalType**|DECIMAL, DEC, NUMERIC|
|**YearMonthIntervalType**|INTERVAL YEAR, INTERVAL YEAR TO MONTH, INTERVAL
MONTH|
|**DayTimeIntervalType**|INTERVAL DAY, INTERVAL DAY TO HOUR, INTERVAL DAY TO
MINUTE, INTERVAL DAY TO SECOND, INTERVAL HOUR, INTERVAL HOUR TO MINUTE,
INTERVAL HOUR TO SECOND, INTERVAL MINUTE, INTERVAL MINUTE TO SECOND, INTERVAL
SECOND|
-|**GeometryType**|GEOMETRY or GEOMETRY(*srid*) or GEOMETRY(ANY)|
-|**GeographyType**|GEOGRAPHY or GEOGRAPHY(*srid*) or GEOGRAPHY(ANY)|
+|**GeometryType**|GEOMETRY(*srid*) or GEOMETRY(ANY)|
+|**GeographyType**|GEOGRAPHY(*srid*) or GEOGRAPHY(ANY)|
|**ArrayType**|ARRAY\<element_type>|
|**StructType**|STRUCT<field1_name: field1_type, field2_name: field2_type,
...><br/> **Note:** ':' is optional.|
|**MapType**|MAP<key_type, value_type>|
diff --git a/docs/sql-ref-functions-builtin.md
b/docs/sql-ref-functions-builtin.md
index b6572609a34b..1912a1e577d5 100644
--- a/docs/sql-ref-functions-builtin.md
+++ b/docs/sql-ref-functions-builtin.md
@@ -126,3 +126,8 @@ license: |
{% include_api_gen generated-variant-funcs-table.html %}
#### Examples
{% include_api_gen generated-variant-funcs-examples.html %}
+
+### Geospatial ST Functions
+{% include_api_gen generated-st-funcs-table.html %}
+#### Examples
+{% include_api_gen generated-st-funcs-examples.html %}
diff --git a/docs/sql-ref-geospatial-types.md b/docs/sql-ref-geospatial-types.md
index ed8b6597ae1f..d5a9d0fece84 100644
--- a/docs/sql-ref-geospatial-types.md
+++ b/docs/sql-ref-geospatial-types.md
@@ -25,8 +25,13 @@ Spark SQL supports **GEOMETRY** and **GEOGRAPHY** types for
spatial data, as def
| Type | Coordinate system | Typical use and notes |
|------|-------------------|------------------------|
-| **GEOMETRY** | Cartesian (planar) | Projected or local coordinates; planar
calculations. Represents points, lines, polygons in a flat coordinate system.
Suitable for Web Mercator (SRID 3857), UTM, or local grids (e.g.
engineering/CAD). Default SRID in Spark is 4326. |
-| **GEOGRAPHY** | Geographic (latitude/longitude) | Earth-based data;
distances and areas on the sphere/ellipsoid. Coordinates in longitude and
latitude (degrees). Edge interpolation is always **SPHERICAL**. Default SRID is
4326 (WGS 84). |
+| **GEOMETRY** | Cartesian (planar) | Projected or local coordinates; planar
calculations. Represents points, lines, polygons in a flat coordinate system.
Suitable for Web Mercator (SRID 3857), UTM, or local grids (e.g.
engineering/CAD). Accepts any SRID in the registry, including SRID 0
(unspecified CRS). |
+| **GEOGRAPHY** | Geographic (latitude/longitude) | Earth-based data;
distances and areas on the sphere/ellipsoid. Coordinates in longitude and
latitude (degrees). Edge interpolation is always **SPHERICAL**. Only geographic
SRIDs are accepted; the most common is 4326 (WGS 84). |
+
+In SQL, `GEOMETRY` and `GEOGRAPHY` columns must always be declared with an
explicit SRID
+(or `ANY`); see [Type Syntax in SQL](#type-syntax-in-sql) below. When a value
is constructed
+via `ST_GeomFromWKB(wkb)` without an explicit SRID, the value's SRID is `0`
(unspecified),
+while `ST_GeogFromWKB(wkb)` always returns a value with SRID 4326.
#### When to use GEOMETRY vs GEOGRAPHY
@@ -113,16 +118,18 @@ When parsing WKB, Spark applies the following rules.
Violations result in a pars
### Built-in Geospatial (ST) Functions
-Spark SQL provides scalar functions for working with GEOMETRY and GEOGRAPHY
values. They are grouped under **st_funcs** in the [Built-in
Functions](sql-ref-functions-builtin.html) API.
+Spark SQL provides scalar functions for working with GEOMETRY and GEOGRAPHY
values. The full list,
+with detailed argument descriptions and examples, is on the
+[Built-in Functions](sql-ref-functions-builtin.html#geospatial-st-functions)
page under
+**Geospatial ST Functions**. The functions provided in the current release are
summarized here:
| Function | Description |
|----------|-------------|
-| `ST_AsBinary(geo)` | Returns the GEOMETRY or GEOGRAPHY value as WKB
(BINARY). |
-| `ST_GeomFromWKB(wkb)` | Parses WKB and returns a GEOMETRY with default SRID
0. |
-| `ST_GeomFromWKB(wkb, srid)` | Parses WKB and returns a GEOMETRY with the
given SRID. |
+| `ST_AsBinary(geo[, endianness])` | Returns the GEOMETRY or GEOGRAPHY value
as WKB (BINARY). The optional `endianness` argument is `'NDR'` for
little-endian (default) or `'XDR'` for big-endian. |
+| `ST_GeomFromWKB(wkb[, srid])` | Parses WKB and returns a GEOMETRY. The
optional `srid` argument sets the SRID; if omitted, the SRID is `0`. |
| `ST_GeogFromWKB(wkb)` | Parses WKB and returns a GEOGRAPHY with SRID 4326. |
| `ST_Srid(geo)` | Returns the SRID of the GEOMETRY or GEOGRAPHY value (NULL
if input is NULL). |
-| `ST_SetSrid(geo, srid)` | Returns a new GEOMETRY or GEOGRAPHY with the given
SRID. |
+| `ST_SetSrid(geo, srid)` | Returns a new GEOMETRY or GEOGRAPHY with the given
SRID. The new SRID must be valid for the value's type. |
**Examples:**
@@ -130,6 +137,9 @@ Spark SQL provides scalar functions for working with
GEOMETRY and GEOGRAPHY valu
SELECT
hex(ST_AsBinary(ST_GeogFromWKB(X'0101000000000000000000F03F0000000000000040')));
-- 0101000000000000000000F03F0000000000000040
+SELECT
hex(ST_AsBinary(ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040'),
'XDR'));
+-- 00000000013FF00000000000004000000000000000
+
SELECT ST_Srid(ST_GeogFromWKB(X'0101000000000000000000F03F0000000000000040'));
-- 4326
@@ -139,9 +149,9 @@ SELECT
ST_Srid(ST_SetSrid(ST_GeomFromWKB(X'0101000000000000000000F03F00000000000
### SRID and Stored Values
-* **Fixed-SRID columns**: Every value in the column must have the same SRID as
the column type. Inserting a value with a different SRID can raise an error (or
you can use `ST_SetSrid` to set the value’s SRID to match the column).
-* **Mixed-SRID columns** (`GEOMETRY(ANY)` or `GEOGRAPHY(ANY)`): Values can
have different SRIDs. Only valid SRIDs are allowed.
-* **Storage**: Parquet, Delta, and Iceberg store geometry/geography with a
fixed SRID per column; mixed-SRID types are for in-memory/query use. When
writing to these formats, a concrete (fixed) SRID is required.
+* **Fixed-SRID columns**: Every value in the column must have the same SRID as
the column type. Inserting a value with a different SRID raises a
`GEO_ENCODER_SRID_MISMATCH_ERROR`. Use `ST_SetSrid` to change a value's SRID to
match the column.
+* **Mixed-SRID columns** (`GEOMETRY(ANY)` or `GEOGRAPHY(ANY)`): Values can
have different SRIDs per row. Each value must still have a valid SRID for the
type; an invalid SRID raises `ST_INVALID_SRID_VALUE`.
+* **Storage**: Parquet, Delta, and Iceberg store geometry/geography with a
fixed SRID per column. They do not support persisting `GEOMETRY(ANY)` or
`GEOGRAPHY(ANY)`; mixed-SRID types exist for in-memory/query use only.
### Supported SRIDs
diff --git a/sql/gen-sql-functions-docs.py b/sql/gen-sql-functions-docs.py
index 13f9ae055fa7..2ae00f6db822 100644
--- a/sql/gen-sql-functions-docs.py
+++ b/sql/gen-sql-functions-docs.py
@@ -36,7 +36,8 @@ groups = {
"bitwise_funcs", "conversion_funcs", "csv_funcs",
"xml_funcs", "lambda_funcs", "collection_funcs",
"url_funcs", "hash_funcs", "struct_funcs",
- "table_funcs", "variant_funcs", "protobuf_funcs", "sketch_funcs"
+ "table_funcs", "variant_funcs", "protobuf_funcs", "sketch_funcs",
+ "st_funcs"
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]