This is an automated email from the ASF dual-hosted git repository.

cloud-fan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new e648859ac6c3 [SPARK-56813][DOCS] Refine the documentation for 
geospatial types and functions
e648859ac6c3 is described below

commit e648859ac6c3a9918c81cc95b61b658c6b3dca54
Author: Uros Bojanic <[email protected]>
AuthorDate: Wed May 13 08:58:33 2026 +0800

    [SPARK-56813][DOCS] Refine the documentation for geospatial types and 
functions
    
    ### What changes were proposed in this pull request?
    Tighten the geospatial documentation:
    - `sql-ref-geospatial-types.md`: update the documentation for supported ST 
functions.
    - `sql-ref-datatypes.md`: drop unparameterized type syntax from the SQL and 
PySpark.
    - `sql-ref-functions-builtin.md`: surface `st_funcs` group as **Geospatial 
ST Functions**.
    - `sql-migration-guide.md`: note that geospatial types and ST functions are 
enabled since 4.2.
    - etc.
    
    ### Why are the changes needed?
    Fix gaps and accuracy for geospatial documentation.
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    Existing tests suffice for docs only changes.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    Generated-by: Claude Opus 4.7.
    
    Closes #55790 from uros-db/geo-docs-refine.
    
    Authored-by: Uros Bojanic <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
---
 docs/sql-ref-datatypes.md         | 16 ++++++++--------
 docs/sql-ref-functions-builtin.md |  5 +++++
 docs/sql-ref-geospatial-types.md  | 30 ++++++++++++++++++++----------
 sql/gen-sql-functions-docs.py     |  3 ++-
 4 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/docs/sql-ref-datatypes.md b/docs/sql-ref-datatypes.md
index 743ad4e3abb2..0ae05d8f46be 100644
--- a/docs/sql-ref-datatypes.md
+++ b/docs/sql-ref-datatypes.md
@@ -95,8 +95,8 @@ Spark SQL and DataFrames support the following data types:
 
 * Spatial types
   Spatial objects as defined in the [OGC Simple Feature 
Access](https://portal.ogc.org/files/?artifact_id=25355) specification.
-  - `GeometryType`: Represents GEOMETRY values—spatial objects in a Cartesian 
coordinate system. The type can be fixed to a single SRID, e.g. 
`geometry(4326)`, or allow mixed SRIDs with `geometry(any)`. Default SRID when 
not specified is 4326 (WGS 84).
-  - `GeographyType`: Represents GEOGRAPHY values—spatial objects in a 
geographic coordinate system (latitude/longitude). Edge interpolation is always 
SPHERICAL. The type can be fixed to a single SRID, e.g. `geography(4326)`, or 
allow mixed SRIDs with `geography(any)`. Default SRID is 4326 (WGS 84).
+  - `GeometryType`: Represents GEOMETRY values, spatial objects in a Cartesian 
coordinate system. The type can be fixed to a single SRID, e.g. 
`geometry(4326)`, or allow mixed SRIDs with `geometry(any)`. In SQL, `GEOMETRY` 
columns must always be declared with an explicit SRID or `ANY`.
+  - `GeographyType`: Represents GEOGRAPHY values, spatial objects in a 
geographic coordinate system (latitude/longitude). Edge interpolation is always 
SPHERICAL. The type can be fixed to a single geographic SRID, e.g. 
`geography(4326)`, or allow mixed SRIDs with `geography(any)`. In SQL, 
`GEOGRAPHY` columns must always be declared with an explicit SRID or `ANY`.
   For more details and built-in functions, see [Geospatial 
(Geometry/Geography) types](sql-ref-geospatial-types.html).
 
 * Complex types
@@ -143,8 +143,8 @@ from pyspark.sql.types import *
 |**TimestampNTZType**|datetime.datetime|TimestampNTZType()|
 |**DateType**|datetime.date|DateType()|
 |**DayTimeIntervalType**|datetime.timedelta|DayTimeIntervalType()|
-|**GeometryType**|Geometry|GeometryType() or GeometryType(*srid*)|
-|**GeographyType**|Geography|GeographyType() or GeographyType(*srid*)|
+|**GeometryType**|Geometry|GeometryType(*srid*)<br/>**Note:** *srid* is 
required and may be an `int` or the string `"ANY"`.|
+|**GeographyType**|Geography|GeographyType(*srid*)<br/>**Note:** *srid* is 
required and may be an `int` or the string `"ANY"`.|
 |**ArrayType**|list, tuple, or array|ArrayType(*elementType*, 
[*containsNull*])<br/>**Note:**The default value of *containsNull* is True.|
 |**MapType**|dict|MapType(*keyType*, *valueType*, 
[*valueContainsNull]*)<br/>**Note:**The default value of *valueContainsNull* is 
True.|
 |**StructType**|list or tuple|StructType(*fields*)<br/>**Note:** *fields* is a 
Seq of StructFields. Also, two fields with the same name are not allowed.|
@@ -179,8 +179,8 @@ You can access them by doing
 |**TimeType**|java.time.LocalTime|TimeType|
 |**YearMonthIntervalType**|java.time.Period|YearMonthIntervalType|
 |**DayTimeIntervalType**|java.time.Duration|DayTimeIntervalType|
-|**GeometryType**|org.apache.spark.sql.types.Geometry|GeometryType or 
GeometryType(*srid*)|
-|**GeographyType**|org.apache.spark.sql.types.Geography|GeographyType or 
GeographyType(*srid*)|
+|**GeometryType**|org.apache.spark.sql.types.Geometry|GeometryType(*srid*)|
+|**GeographyType**|org.apache.spark.sql.types.Geography|GeographyType(*srid*)|
 |**ArrayType**|scala.collection.Seq|ArrayType(*elementType*, 
[*containsNull]*)<br/>**Note:** The default value of *containsNull* is true.|
 |**MapType**|scala.collection.Map|MapType(*keyType*, *valueType*, 
[*valueContainsNull]*)<br/>**Note:** The default value of *valueContainsNull* 
is true.|
 |**StructType**|org.apache.spark.sql.Row|StructType(*fields*)<br/>**Note:** 
*fields* is a Seq of StructFields. Also, two fields with the same name are not 
allowed.|
@@ -272,8 +272,8 @@ The following table shows the type names as well as aliases 
used in Spark SQL pa
 |**DecimalType**|DECIMAL, DEC, NUMERIC|
 |**YearMonthIntervalType**|INTERVAL YEAR, INTERVAL YEAR TO MONTH, INTERVAL 
MONTH|
 |**DayTimeIntervalType**|INTERVAL DAY, INTERVAL DAY TO HOUR, INTERVAL DAY TO 
MINUTE, INTERVAL DAY TO SECOND, INTERVAL HOUR, INTERVAL HOUR TO MINUTE, 
INTERVAL HOUR TO SECOND, INTERVAL MINUTE, INTERVAL MINUTE TO SECOND, INTERVAL 
SECOND|
-|**GeometryType**|GEOMETRY or GEOMETRY(*srid*) or GEOMETRY(ANY)|
-|**GeographyType**|GEOGRAPHY or GEOGRAPHY(*srid*) or GEOGRAPHY(ANY)|
+|**GeometryType**|GEOMETRY(*srid*) or GEOMETRY(ANY)|
+|**GeographyType**|GEOGRAPHY(*srid*) or GEOGRAPHY(ANY)|
 |**ArrayType**|ARRAY\<element_type>|
 |**StructType**|STRUCT<field1_name: field1_type, field2_name: field2_type, 
...><br/> **Note:** ':' is optional.|
 |**MapType**|MAP<key_type, value_type>|
diff --git a/docs/sql-ref-functions-builtin.md 
b/docs/sql-ref-functions-builtin.md
index b6572609a34b..1912a1e577d5 100644
--- a/docs/sql-ref-functions-builtin.md
+++ b/docs/sql-ref-functions-builtin.md
@@ -126,3 +126,8 @@ license: |
 {% include_api_gen generated-variant-funcs-table.html %}
 #### Examples
 {% include_api_gen generated-variant-funcs-examples.html %}
+
+### Geospatial ST Functions
+{% include_api_gen generated-st-funcs-table.html %}
+#### Examples
+{% include_api_gen generated-st-funcs-examples.html %}
diff --git a/docs/sql-ref-geospatial-types.md b/docs/sql-ref-geospatial-types.md
index ed8b6597ae1f..d5a9d0fece84 100644
--- a/docs/sql-ref-geospatial-types.md
+++ b/docs/sql-ref-geospatial-types.md
@@ -25,8 +25,13 @@ Spark SQL supports **GEOMETRY** and **GEOGRAPHY** types for 
spatial data, as def
 
 | Type | Coordinate system | Typical use and notes |
 |------|-------------------|------------------------|
-| **GEOMETRY** | Cartesian (planar) | Projected or local coordinates; planar 
calculations. Represents points, lines, polygons in a flat coordinate system. 
Suitable for Web Mercator (SRID 3857), UTM, or local grids (e.g. 
engineering/CAD). Default SRID in Spark is 4326. |
-| **GEOGRAPHY** | Geographic (latitude/longitude) | Earth-based data; 
distances and areas on the sphere/ellipsoid. Coordinates in longitude and 
latitude (degrees). Edge interpolation is always **SPHERICAL**. Default SRID is 
4326 (WGS 84). |
+| **GEOMETRY** | Cartesian (planar) | Projected or local coordinates; planar 
calculations. Represents points, lines, polygons in a flat coordinate system. 
Suitable for Web Mercator (SRID 3857), UTM, or local grids (e.g. 
engineering/CAD). Accepts any SRID in the registry, including SRID 0 
(unspecified CRS). |
+| **GEOGRAPHY** | Geographic (latitude/longitude) | Earth-based data; 
distances and areas on the sphere/ellipsoid. Coordinates in longitude and 
latitude (degrees). Edge interpolation is always **SPHERICAL**. Only geographic 
SRIDs are accepted; the most common is 4326 (WGS 84). |
+
+In SQL, `GEOMETRY` and `GEOGRAPHY` columns must always be declared with an 
explicit SRID
+(or `ANY`); see [Type Syntax in SQL](#type-syntax-in-sql) below. When a value 
is constructed
+via `ST_GeomFromWKB(wkb)` without an explicit SRID, the value's SRID is `0` 
(unspecified),
+while `ST_GeogFromWKB(wkb)` always returns a value with SRID 4326.
 
 #### When to use GEOMETRY vs GEOGRAPHY
 
@@ -113,16 +118,18 @@ When parsing WKB, Spark applies the following rules. 
Violations result in a pars
 
 ### Built-in Geospatial (ST) Functions
 
-Spark SQL provides scalar functions for working with GEOMETRY and GEOGRAPHY 
values. They are grouped under **st_funcs** in the [Built-in 
Functions](sql-ref-functions-builtin.html) API.
+Spark SQL provides scalar functions for working with GEOMETRY and GEOGRAPHY 
values. The full list,
+with detailed argument descriptions and examples, is on the
+[Built-in Functions](sql-ref-functions-builtin.html#geospatial-st-functions) 
page under
+**Geospatial ST Functions**. The functions provided in the current release are 
summarized here:
 
 | Function | Description |
 |----------|-------------|
-| `ST_AsBinary(geo)` | Returns the GEOMETRY or GEOGRAPHY value as WKB 
(BINARY). |
-| `ST_GeomFromWKB(wkb)` | Parses WKB and returns a GEOMETRY with default SRID 
0. |
-| `ST_GeomFromWKB(wkb, srid)` | Parses WKB and returns a GEOMETRY with the 
given SRID. |
+| `ST_AsBinary(geo[, endianness])` | Returns the GEOMETRY or GEOGRAPHY value 
as WKB (BINARY). The optional `endianness` argument is `'NDR'` for 
little-endian (default) or `'XDR'` for big-endian. |
+| `ST_GeomFromWKB(wkb[, srid])` | Parses WKB and returns a GEOMETRY. The 
optional `srid` argument sets the SRID; if omitted, the SRID is `0`. |
 | `ST_GeogFromWKB(wkb)` | Parses WKB and returns a GEOGRAPHY with SRID 4326. |
 | `ST_Srid(geo)` | Returns the SRID of the GEOMETRY or GEOGRAPHY value (NULL 
if input is NULL). |
-| `ST_SetSrid(geo, srid)` | Returns a new GEOMETRY or GEOGRAPHY with the given 
SRID. |
+| `ST_SetSrid(geo, srid)` | Returns a new GEOMETRY or GEOGRAPHY with the given 
SRID. The new SRID must be valid for the value's type. |
 
 **Examples:**
 
@@ -130,6 +137,9 @@ Spark SQL provides scalar functions for working with 
GEOMETRY and GEOGRAPHY valu
 SELECT 
hex(ST_AsBinary(ST_GeogFromWKB(X'0101000000000000000000F03F0000000000000040')));
 -- 0101000000000000000000F03F0000000000000040
 
+SELECT 
hex(ST_AsBinary(ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040'), 
'XDR'));
+-- 00000000013FF00000000000004000000000000000
+
 SELECT ST_Srid(ST_GeogFromWKB(X'0101000000000000000000F03F0000000000000040'));
 -- 4326
 
@@ -139,9 +149,9 @@ SELECT 
ST_Srid(ST_SetSrid(ST_GeomFromWKB(X'0101000000000000000000F03F00000000000
 
 ### SRID and Stored Values
 
-* **Fixed-SRID columns**: Every value in the column must have the same SRID as 
the column type. Inserting a value with a different SRID can raise an error (or 
you can use `ST_SetSrid` to set the value’s SRID to match the column).
-* **Mixed-SRID columns** (`GEOMETRY(ANY)` or `GEOGRAPHY(ANY)`): Values can 
have different SRIDs. Only valid SRIDs are allowed.
-* **Storage**: Parquet, Delta, and Iceberg store geometry/geography with a 
fixed SRID per column; mixed-SRID types are for in-memory/query use. When 
writing to these formats, a concrete (fixed) SRID is required.
+* **Fixed-SRID columns**: Every value in the column must have the same SRID as 
the column type. Inserting a value with a different SRID raises a 
`GEO_ENCODER_SRID_MISMATCH_ERROR`. Use `ST_SetSrid` to change a value's SRID to 
match the column.
+* **Mixed-SRID columns** (`GEOMETRY(ANY)` or `GEOGRAPHY(ANY)`): Values can 
have different SRIDs per row. Each value must still have a valid SRID for the 
type; an invalid SRID raises `ST_INVALID_SRID_VALUE`.
+* **Storage**: Parquet, Delta, and Iceberg store geometry/geography with a 
fixed SRID per column. They do not support persisting `GEOMETRY(ANY)` or 
`GEOGRAPHY(ANY)`; mixed-SRID types exist for in-memory/query use only.
 
 ### Supported SRIDs
 
diff --git a/sql/gen-sql-functions-docs.py b/sql/gen-sql-functions-docs.py
index 13f9ae055fa7..2ae00f6db822 100644
--- a/sql/gen-sql-functions-docs.py
+++ b/sql/gen-sql-functions-docs.py
@@ -36,7 +36,8 @@ groups = {
     "bitwise_funcs", "conversion_funcs", "csv_funcs",
     "xml_funcs", "lambda_funcs", "collection_funcs",
     "url_funcs", "hash_funcs", "struct_funcs",
-    "table_funcs", "variant_funcs", "protobuf_funcs", "sketch_funcs"
+    "table_funcs", "variant_funcs", "protobuf_funcs", "sketch_funcs",
+    "st_funcs"
 }
 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to