sp-202 opened a new pull request, #4423:
URL: https://github.com/apache/datafusion-comet/pull/4423
## Summary
- Adds 40 geospatial SQL functions registered as Spark SQL extensions via
`SparkSessionExtensions.injectFunction`. All functions execute natively in
the Rust/DataFusion engine when Comet is enabled
(`spark.comet.exec.enabled=true`).
- Geometries are represented as WKT strings, consistent with the existing
Comet
geo UDF convention.
- Includes optional Sedona interop: if Apache Sedona is on the classpath,
Sedona's
`ST_*` expression classes are transparently mapped to the same Comet
native UDFs
via the serde layer, so Sedona queries also benefit from native
acceleration.
## New files
- `spark/src/main/scala/org/apache/comet/expressions/GeoExpressions.scala`
— 40 Spark expression case classes and their `SparkSessionExtensions`
registration descriptors.
- `spark/src/main/scala/org/apache/comet/expressions/CometGeoFallback.scala`
— JVM fallback stubs. Called only when Comet native execution is disabled
and
Sedona is not on the classpath; all methods throw
`UnsupportedOperationException`.
- `spark/src/main/scala/org/apache/comet/serde/geo.scala`
— Serde map wiring each expression class to its named DataFusion scalar
function.
- `docs/geo-functions.md`
— Reference documentation covering all 40 functions with signatures,
parameter
types, return types, descriptions, and SQL examples.
## Functions added (40)
| Category | Functions |
|---|---|
| Constructors | `st_geomfromwkt`, `st_geomfromgeojson`, `st_point`,
`st_makeenvelope`, `st_makeline` |
| Serializers | `st_astext`, `st_asgeojson` |
| Predicates | `st_contains`, `st_intersects`, `st_within`, `st_covers`,
`st_coveredby`, `st_equals`, `st_touches`, `st_crosses`, `st_disjoint`,
`st_overlaps` |
| Measurements | `st_area`, `st_length`, `st_perimeter`, `st_distance`,
`st_distancesphere`, `st_hausdorffdistance`, `st_numpoints`, `st_x`, `st_y` |
| Accessors | `st_isempty`, `st_geometrytype` |
| Transformations | `st_centroid`, `st_envelope`, `st_convexhull`,
`st_buffer`, `st_simplify`, `st_simplifypreservetopology`,
`st_flipcoordinates`, `st_boundary` |
| Set operations | `st_union`, `st_intersection`, `st_difference`,
`st_symdifference` |
## Test plan
- [ ] `./mvn scalastyle:check` passes on all changed Scala files
- [ ] `./mvn spotless:check` passes (scalafmt formatting)
- [ ] `./mvn compile` succeeds with no errors or warnings
- [ ] Verified all 40 functions are registered in a live `spark-shell`
session
(`spark.sessionState.functionRegistry.lookupFunction`)
- [ ] Ran a stress test with 10 000 parquet rows across 10 partitions:
per-row
transforms (18 functions), aggregation + shuffle
(`CometHashAggregate`),
broadcast self-join (`CometBroadcastHashJoin`), and window functions
all
produced correct results
- [ ] Query plans confirm `CometNativeScan` + `CometProject` throughout —
no fallback to JVM evaluation
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]