[PR] feat: add 40 native geospatial SQL functions (ST_ expressions) [datafusion-comet]

via GitHub Mon, 25 May 2026 06:24:19 -0700


sp-202 opened a new pull request, #4423:
URL: https://github.com/apache/datafusion-comet/pull/4423


   ## Summary
   
   - Adds 40 geospatial SQL functions registered as Spark SQL extensions via
     `SparkSessionExtensions.injectFunction`. All functions execute natively in
     the Rust/DataFusion engine when Comet is enabled 
(`spark.comet.exec.enabled=true`).
   - Geometries are represented as WKT strings, consistent with the existing 
Comet
     geo UDF convention.
   - Includes optional Sedona interop: if Apache Sedona is on the classpath, 
Sedona's
     `ST_*` expression classes are transparently mapped to the same Comet 
native UDFs
     via the serde layer, so Sedona queries also benefit from native 
acceleration.
   
   ## New files
   
   - `spark/src/main/scala/org/apache/comet/expressions/GeoExpressions.scala`
     — 40 Spark expression case classes and their `SparkSessionExtensions` 
registration descriptors.
   - `spark/src/main/scala/org/apache/comet/expressions/CometGeoFallback.scala`
     — JVM fallback stubs. Called only when Comet native execution is disabled 
and
     Sedona is not on the classpath; all methods throw 
`UnsupportedOperationException`.
   - `spark/src/main/scala/org/apache/comet/serde/geo.scala`
     — Serde map wiring each expression class to its named DataFusion scalar 
function.
   - `docs/geo-functions.md`
     — Reference documentation covering all 40 functions with signatures, 
parameter
     types, return types, descriptions, and SQL examples.
   
   ## Functions added (40)
   
   | Category | Functions |
   |---|---|
   | Constructors | `st_geomfromwkt`, `st_geomfromgeojson`, `st_point`, 
`st_makeenvelope`, `st_makeline` |
   | Serializers | `st_astext`, `st_asgeojson` |
   | Predicates | `st_contains`, `st_intersects`, `st_within`, `st_covers`, 
`st_coveredby`, `st_equals`, `st_touches`, `st_crosses`, `st_disjoint`, 
`st_overlaps` |
   | Measurements | `st_area`, `st_length`, `st_perimeter`, `st_distance`, 
`st_distancesphere`, `st_hausdorffdistance`, `st_numpoints`, `st_x`, `st_y` |
   | Accessors | `st_isempty`, `st_geometrytype` |
   | Transformations | `st_centroid`, `st_envelope`, `st_convexhull`, 
`st_buffer`, `st_simplify`, `st_simplifypreservetopology`, 
`st_flipcoordinates`, `st_boundary` |
   | Set operations | `st_union`, `st_intersection`, `st_difference`, 
`st_symdifference` |
   
   ## Test plan
   
   - [ ] `./mvn scalastyle:check` passes on all changed Scala files
   - [ ] `./mvn spotless:check` passes (scalafmt formatting)
   - [ ] `./mvn compile` succeeds with no errors or warnings
   - [ ] Verified all 40 functions are registered in a live `spark-shell` 
session
         (`spark.sessionState.functionRegistry.lookupFunction`)
   - [ ] Ran a stress test with 10 000 parquet rows across 10 partitions: 
per-row
         transforms (18 functions), aggregation + shuffle 
(`CometHashAggregate`),
         broadcast self-join (`CometBroadcastHashJoin`), and window functions 
all
         produced correct results
   - [ ] Query plans confirm `CometNativeScan` + `CometProject` throughout —
         no fallback to JVM evaluation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat: add 40 native geospatial SQL functions (ST_ expressions) [datafusion-comet]

Reply via email to