Copilot commented on code in PR #2622:
URL: https://github.com/apache/sedona/pull/2622#discussion_r2778284627


##########
snowflake/src/main/java/org/apache/sedona/snowflake/snowsql/udtfs/ST_Envelope_Aggr.java:
##########
@@ -50,6 +50,9 @@ public ST_Envelope_Aggr() {}
 
   public Stream<OutputRow> process(byte[] geom) throws ParseException {
     Geometry geometry = GeometrySerde.deserialize(geom);
+    if (geometry.isEmpty()) {

Review Comment:
   `process` calls `geometry.isEmpty()` without guarding for `geom == null` 
(SQL NULL input) or `geometry == null` (if deserialization yields null). This 
can throw NPE and contradict the documented “null values are skipped” behavior. 
Add a null check (e.g., return `Stream.empty()` when `geom`/`geometry` is null) 
before calling `isEmpty()`.
   ```suggestion
       // Skip null input geometries (SQL NULL) as per "null values are 
skipped" behavior.
       if (geom == null) {
         return Stream.empty();
       }
   
       Geometry geometry = GeometrySerde.deserialize(geom);
       // Skip if deserialization yields null or an empty geometry.
       if (geometry == null || geometry.isEmpty()) {
   ```



##########
snowflake/src/main/java/org/apache/sedona/snowflake/snowsql/udtfs/ST_Envelope_Aggr.java:
##########
@@ -59,6 +62,9 @@ public Stream<OutputRow> process(byte[] geom) throws 
ParseException {
   }
 
   public Stream<OutputRow> endPartition() {
+    if (buffer == null || buffer.isNull()) {
+      return Stream.empty();

Review Comment:
   Returning `Stream.empty()` from `endPartition()` may produce **no output 
row** rather than a single row whose value is `NULL`, depending on how this 
UDTF is wrapped/queried. If the intended SQL semantics are “aggregate result is 
NULL”, consider emitting one `OutputRow` with a null geometry payload instead 
of emitting zero rows (or ensure the wrapper layer converts empty-stream to 
NULL reliably).
   ```suggestion
         // No non-empty geometries were seen; emit a single row with a NULL 
envelope
         return Stream.of(new OutputRow(null));
   ```



##########
snowflake/src/main/java/org/apache/sedona/snowflake/snowsql/udtfs/ST_Envelope_Agg.java:
##########
@@ -50,6 +50,9 @@ public ST_Envelope_Agg() {}
 
   public Stream<OutputRow> process(byte[] geom) throws ParseException {
     Geometry geometry = GeometrySerde.deserialize(geom);
+    if (geometry.isEmpty()) {

Review Comment:
   Same issue as `ST_Envelope_Aggr`: `geometry.isEmpty()` can NPE for SQL NULL 
inputs (or null deserialization). Guard `geom`/`geometry` for null to correctly 
“skip null values”.
   ```suggestion
       if (geom == null) {
         // Skip SQL NULL inputs
         return Stream.empty();
       }
       Geometry geometry = GeometrySerde.deserialize(geom);
       if (geometry == null || geometry.isEmpty()) {
         // Skip null or empty geometries
   ```



##########
flink/src/test/java/org/apache/sedona/flink/AggregatorTest.java:
##########
@@ -46,6 +47,19 @@ public void testEnvelop_Aggr() {
         last.getField(0).toString());
   }
 
+  @Test
+  public void testEnvelop_Aggr_EmptyGeometries() {

Review Comment:
   Typo in the new test name: `testEnvelop_Aggr_EmptyGeometries` is missing the 
“e” in “Envelope”. Renaming to `testEnvelope_Aggr_EmptyGeometries` improves 
clarity and consistency with the function name.
   ```suggestion
     public void testEnvelope_Aggr_EmptyGeometries() {
   ```



##########
spark/common/src/test/scala/org/apache/sedona/sql/aggregateFunctionTestScala.scala:
##########
@@ -50,6 +50,24 @@ class aggregateFunctionTestScala extends TestBaseScala {
       assert(boundary.take(1)(0).get(0) == 
geometryFactory.createPolygon(coordinates))
     }
 
+    it("Passed ST_Envelope_aggr with empty geometries returns null") {
+      val emptyDf = sparkSession.sql(
+        "SELECT ST_GeomFromWKT(wkt) as geom FROM VALUES ('POINT EMPTY'), 
('LINESTRING EMPTY'), ('POLYGON EMPTY') AS t(wkt)")
+      emptyDf.createOrReplaceTempView("emptydf")
+      val result = sparkSession.sql("SELECT ST_Envelope_Aggr(emptydf.geom) 
FROM emptydf")
+      assert(result.take(1)(0).get(0) == null)
+    }
+
+    it("Passed ST_Envelope_aggr with mixed empty and non-empty geometries") {
+      val mixedDf = sparkSession.sql(
+        "SELECT ST_GeomFromWKT(wkt) as geom FROM VALUES ('POINT EMPTY'), 
('POINT (1 2)'), ('POINT (3 4)') AS t(wkt)")
+      mixedDf.createOrReplaceTempView("mixeddf")
+      val result = sparkSession.sql("SELECT ST_Envelope_Aggr(mixeddf.geom) 
FROM mixeddf")
+      val envelope = result.take(1)(0).get(0).asInstanceOf[Geometry]
+      assert(envelope != null)
+      assert(!envelope.isEmpty)
+    }

Review Comment:
   This test only asserts that the result is non-null and non-empty, but it 
doesn’t verify that empties are actually skipped correctly (i.e., the computed 
bounds match the non-empty points). Strengthen the assertion by checking the 
expected envelope coordinates (min/max X/Y) or comparing against an expected 
polygon/envelope for points (1,2) and (3,4).



##########
flink/src/main/java/org/apache/sedona/flink/expressions/Aggregators.java:
##########
@@ -66,7 +67,9 @@ public void accumulate(
                 rawSerializer = GeometryTypeSerializer.class,
                 bridgedTo = Geometry.class)
             Object o) {
-      Envelope envelope = ((Geometry) o).getEnvelopeInternal();
+      Geometry geometry = (Geometry) o;
+      if (geometry.isEmpty()) return;
+      Envelope envelope = geometry.getEnvelopeInternal();

Review Comment:
   `accumulate` does not handle `o == null`. If Flink calls `accumulate` with 
nulls for nullable columns, this will throw at cast / `isEmpty()`. To align 
with “null values are skipped”, add an early return when `o` is null before 
casting/inspecting the geometry.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to