Re: [PR] [GH-2731] Implement GeoSeries: frechet_distance, hausdorff_distance, geom_equals, interpolate, project [sedona]

via GitHub Fri, 13 Mar 2026 00:58:03 -0700


Copilot commented on code in PR #2732:
URL: https://github.com/apache/sedona/pull/2732#discussion_r2929662860



##########
common/src/main/java/org/apache/sedona/common/Functions.java:
##########
@@ -1485,7 +1486,8 @@ public static Geometry triangulatePolygon(Geometry geom) {
     return ConstrainedDelaunayTriangulator.triangulate(geom);
   }
 
-  public static double lineLocatePoint(Geometry geom, Geometry point) {
+  public static Double lineLocatePoint(Geometry geom, Geometry point) {
+    if (geom.isEmpty()) return null;
     double length = geom.getLength();
     LengthIndexedLine indexedLine = new LengthIndexedLine(geom);
     return indexedLine.indexOf(point.getCoordinate()) / length;

Review Comment:
   `lineLocatePoint` can still return `NaN`/`Infinity` (or throw) for 
valid-but-degenerate inputs. If `geom` is non-empty but has `length == 0` 
(e.g., repeated coordinates), this does `index / 0`. Also, if `point` is empty, 
`point.getCoordinate()` is null and `indexOf(...)` will NPE. Please add guards 
(e.g., return 0.0 for zero-length lines and null for empty points) to avoid 
producing invalid results / runtime failures and to better match 
PostGIS/GeoPandas behavior.
   



##########
common/src/test/java/org/apache/sedona/common/FunctionsTest.java:
##########
@@ -4407,6 +4407,21 @@ public void lineLocatePoint() {
     assertEquals(expectedResult3, actual3, FP_TOLERANCE);
   }
 
+  @Test
+  public void lineLocatePointEmpty() {
+    LineString emptyLine = GEOMETRY_FACTORY.createLineString();
+    Geometry point = GEOMETRY_FACTORY.createPoint(new Coordinate(1, 1));
+    assertNull(Functions.lineLocatePoint(emptyLine, point));
+  }
+
+  @Test

Review Comment:
   The new empty-geometry tests are helpful, but they don’t cover the closely 
related degenerate case where a LineString is non-empty but has zero length 
(e.g., repeated coordinates), which currently leads to `0/0 -> NaN` in 
`Functions.lineLocatePoint`. Adding an assertion for that case (and for an 
empty `point` argument) would prevent regressions and validate the intended 
semantics.
   



##########
python/sedona/spark/geopandas/geoseries.py:
##########
@@ -1330,6 +1330,89 @@ def distance(self, other, align=None) -> pspd.Series:
         )
         return result
 
+    def frechet_distance(self, other, align=None, densify=None) -> pspd.Series:
+        if densify is not None:
+            raise NotImplementedError(
+                "Sedona does not support the densify parameter for 
frechet_distance."
+            )
+
+        other_series, extended = self._make_series_of_val(other)
+        align = False if extended else align
+
+        spark_expr = stf.ST_FrechetDistance(F.col("L"), F.col("R"))
+        result = self._row_wise_operation(
+            spark_expr,
+            other_series,
+            align,
+            default_val=None,
+        )
+        return result
+
+    def hausdorff_distance(self, other, align=None, densify=None) -> 
pspd.Series:
+        other_series, extended = self._make_series_of_val(other)
+        align = False if extended else align
+
+        if densify is not None:
+            spark_expr = stf.ST_HausdorffDistance(F.col("L"), F.col("R"), 
densify)
+        else:
+            spark_expr = stf.ST_HausdorffDistance(F.col("L"), F.col("R"))
+        result = self._row_wise_operation(
+            spark_expr,
+            other_series,
+            align,
+            default_val=None,
+        )
+        return result
+
+    def geom_equals(self, other, align=None) -> pspd.Series:
+        other_series, extended = self._make_series_of_val(other)
+        align = False if extended else align
+
+        spark_expr = stp.ST_Equals(F.col("L"), F.col("R"))
+        result = self._row_wise_operation(
+            spark_expr,
+            other_series,
+            align,
+            returns_geom=False,
+            default_val=False,
+        )
+        return _to_bool(result)
+
+    def interpolate(self, distance, normalized=False) -> "GeoSeries":
+        other_series, extended = self._make_series_of_val(distance)
+        align = not extended
+
+        if normalized:
+            spark_expr = stf.ST_LineInterpolatePoint(F.col("L"), F.col("R"))
+        else:
+            length = stf.ST_Length(F.col("L"))
+            fraction = F.when(length == 0, F.lit(0.0)).otherwise(F.col("R") / 
length)
+            spark_expr = stf.ST_LineInterpolatePoint(F.col("L"), fraction)
+        return self._row_wise_operation(
+            spark_expr,
+            other_series,
+            align=align,
+            returns_geom=True,
+        )
+
+    def project(self, other, normalized=False, align=None) -> pspd.Series:
+        other_series, extended = self._make_series_of_val(other)
+        align = False if extended else align
+
+        if normalized:
+            spark_expr = stf.ST_LineLocatePoint(F.col("L"), F.col("R"))
+        else:
+            spark_expr = stf.ST_LineLocatePoint(F.col("L"), F.col("R")) * 
stf.ST_Length(
+                F.col("L")
+            )

Review Comment:
   `project(normalized=False)` multiplies `ST_LineLocatePoint(...)` by 
`ST_Length(L)`. For zero-length (non-empty) lines, `ST_LineLocatePoint` 
currently computes `index/length` (0/0) which yields `NaN`, and `NaN * 0` 
remains `NaN` (instead of the expected 0.0 absolute distance). Consider 
explicitly handling `ST_Length(L) == 0` (return 0.0) for both normalized and 
absolute modes to avoid propagating `NaN`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [GH-2731] Implement GeoSeries: frechet_distance, hausdorff_distance, geom_equals, interpolate, project [sedona]

Reply via email to