Copilot commented on code in PR #2628:
URL: https://github.com/apache/sedona/pull/2628#discussion_r2779099746


##########
snowflake-tester/src/test/java/org/apache/sedona/snowflake/snowsql/TestFunctionsV2.java:
##########
@@ -451,6 +451,19 @@ public void test_ST_GeoHash() {
         "u3r0p");
   }
 
+  @Test
+  public void test_ST_GeoHashNeighbors() {
+    registerUDFV2("ST_GeoHashNeighbors", String.class);
+    verifySqlSingleRes("select sedona.ST_GeoHashNeighbor('u1pb', 'n')", 
"u1pc");

Review Comment:
   This test registers `ST_GeoHashNeighbors` but executes `ST_GeoHashNeighbor`, 
so it doesn’t validate the neighbors UDF at all. Update the query/assertion to 
call `sedona.ST_GeoHashNeighbors(...)` and verify the returned array contents 
(or at least the expected length/value at known indices).
   ```suggestion
       verifySqlSingleRes("select 
ARRAY_SIZE(sedona.ST_GeoHashNeighbors('u1pb'))", 8);
   ```



##########
common/src/main/java/org/apache/sedona/common/utils/GeoHashNeighbor.java:
##########
@@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.sedona.common.utils;
+
+/**
+ * Utility class for computing geohash neighbors.
+ *
+ * <p>The neighbor algorithm is ported from the geohash-java library by Silvio 
Heuberger (Apache 2.0
+ * License). It works by de-interleaving the geohash bits into separate 
latitude/longitude
+ * components, incrementing/decrementing the appropriate component, masking 
for wrap-around, and
+ * re-interleaving back to a geohash string.
+ *
+ * @see <a href="https://github.com/kungfoo/geohash-java";>geohash-java</a>
+ */
+public class GeoHashNeighbor {
+
+  private static final int MAX_BIT_PRECISION = 64;
+  private static final long FIRST_BIT_FLAGGED = 0x8000000000000000L;
+
+  // Shared base32 constants from GeoHashUtils
+  private static final char[] base32 = GeoHashUtils.BASE32_CHARS;
+  private static final int[] decodeArray = GeoHashUtils.DECODE_ARRAY;
+
+  /**
+   * Returns all 8 neighbors of the given geohash in the order: [N, NE, E, SE, 
S, SW, W, NW].
+   *
+   * @param geohash the geohash string
+   * @return array of 8 neighbor geohash strings, or null if input is null
+   */
+  public static String[] getNeighbors(String geohash) {
+    if (geohash == null) {
+      return null;
+    }
+
+    long[] parsed = parseGeohash(geohash);
+    long bits = parsed[0];
+    int significantBits = (int) parsed[1];
+
+    // Ported from geohash-java: GeoHash.getAdjacent()
+    String northern = neighborLat(bits, significantBits, 1);
+    String eastern = neighborLon(bits, significantBits, 1);
+    String southern = neighborLat(bits, significantBits, -1);
+    String western = neighborLon(bits, significantBits, -1);
+
+    // Diagonal neighbors: compose two cardinal moves
+    long[] northBits = parseGeohash(northern);
+    long[] southBits = parseGeohash(southern);
+
+    return new String[] {
+      northern,
+      neighborLon(northBits[0], significantBits, 1), // NE
+      eastern,
+      neighborLon(southBits[0], significantBits, 1), // SE
+      southern,
+      neighborLon(southBits[0], significantBits, -1), // SW
+      western,
+      neighborLon(northBits[0], significantBits, -1) // NW
+    };
+  }
+
+  /**
+   * Returns the neighbor of the given geohash in the specified direction.
+   *
+   * @param geohash the geohash string
+   * @param direction compass direction: "n", "ne", "e", "se", "s", "sw", "w", 
"nw"
+   *     (case-insensitive)
+   * @return the neighbor geohash string, or null if input geohash is null
+   */
+  public static String getNeighbor(String geohash, String direction) {
+    if (geohash == null) {
+      return null;
+    }
+    if (direction == null) {
+      throw new IllegalArgumentException(
+          "Direction cannot be null. Valid values: n, ne, e, se, s, sw, w, 
nw");
+    }
+
+    long[] parsed = parseGeohash(geohash);
+    long bits = parsed[0];
+    int significantBits = (int) parsed[1];
+
+    switch (direction.toLowerCase()) {
+      case "n":
+        return neighborLat(bits, significantBits, 1);
+      case "s":
+        return neighborLat(bits, significantBits, -1);
+      case "e":
+        return neighborLon(bits, significantBits, 1);
+      case "w":
+        return neighborLon(bits, significantBits, -1);
+      case "ne":
+        {
+          long[] n = parseGeohash(neighborLat(bits, significantBits, 1));
+          return neighborLon(n[0], significantBits, 1);
+        }
+      case "se":
+        {
+          long[] s = parseGeohash(neighborLat(bits, significantBits, -1));
+          return neighborLon(s[0], significantBits, 1);
+        }
+      case "sw":
+        {
+          long[] s = parseGeohash(neighborLat(bits, significantBits, -1));
+          return neighborLon(s[0], significantBits, -1);
+        }
+      case "nw":
+        {
+          long[] n = parseGeohash(neighborLat(bits, significantBits, 1));
+          return neighborLon(n[0], significantBits, -1);
+        }
+      default:
+        throw new IllegalArgumentException(
+            "Invalid direction: '" + direction + "'. Valid values: n, ne, e, 
se, s, sw, w, nw");
+    }
+  }
+
+  /**
+   * Parses a geohash string into [bits, significantBits]. Based on 
geohash-java
+   * GeoHash.fromGeohashString().
+   */
+  private static long[] parseGeohash(String geohash) {
+    if (geohash.isEmpty()) {
+      throw new IllegalArgumentException("Geohash string cannot be empty");
+    }
+    long bits = 0;
+    int significantBits = 0;
+    for (int i = 0; i < geohash.length(); i++) {
+      char c = geohash.charAt(i);
+      int cd;
+      if (c >= decodeArray.length || (cd = decodeArray[c]) < 0) {
+        throw new IllegalArgumentException(
+            "Invalid character '" + c + "' in geohash '" + geohash + "'");
+      }
+      for (int j = 0; j < GeoHashUtils.BITS_PER_CHAR; j++) {
+        significantBits++;
+        bits <<= 1;
+        if ((cd & (16 >> j)) != 0) {
+          bits |= 0x1;
+        }
+      }
+    }
+    bits <<= (MAX_BIT_PRECISION - significantBits);
+    return new long[] {bits, significantBits};
+  }

Review Comment:
   `parseGeohash` does not guard against inputs whose bit-length exceeds 64 
(e.g., geohash length >= 13 => significantBits >= 65). This makes `bits <<= 
(MAX_BIT_PRECISION - significantBits)` a negative shift (masked by the JVM) and 
yields incorrect results. Add validation to reject geohashes where 
`geohash.length() * 5 > MAX_BIT_PRECISION` (or `significantBits > 
MAX_BIT_PRECISION`) with a clear `IllegalArgumentException`.



##########
spark/common/src/test/scala/org/apache/sedona/sql/functionTestScala.scala:
##########
@@ -4277,4 +4281,33 @@ class functionTestScala
 
     FileUtils.deleteDirectory(new File(tmpDir))
   }
+
+  it("Should pass ST_GeoHashNeighbors") {
+    var result = sparkSession.sql("SELECT 
ST_GeoHashNeighbors('u1pb')").first().getList[String](0)
+    assert(result.size() == 8)
+    assert(result.get(0) == "u1pc") // N
+    assert(result.get(2) == "u300") // E
+    assert(result.get(4) == "u0zz") // S
+    assert(result.get(6) == "u1p8") // W
+
+    result = sparkSession.sql("SELECT 
ST_GeoHashNeighbors('dqcjqc')").first().getList[String](0)
+    assert(result.size() == 8)
+    val expected =
+      Set("dqcjqf", "dqcjr4", "dqcjr1", "dqcjr0", "dqcjq9", "dqcjq8", 
"dqcjqb", "dqcjqd")
+    val actual = (0 until 8).map(result.get).toSet
+    assert(actual == expected)
+  }
+
+  it("Should pass ST_GeoHashNeighbor") {
+    var result = sparkSession.sql("SELECT ST_GeoHashNeighbor('u1pb', 
'n')").first().getString(0)
+    assert(result == "u1pc")
+    result = sparkSession.sql("SELECT ST_GeoHashNeighbor('u1pb', 
'e')").first().getString(0)
+    assert(result == "u300")
+    result = sparkSession.sql("SELECT ST_GeoHashNeighbor('u1pb', 
's')").first().getString(0)
+    assert(result == "u0zz")
+    result = sparkSession.sql("SELECT ST_GeoHashNeighbor('u1pb', 
'w')").first().getString(0)
+    assert(result == "u1p8")
+    result = sparkSession.sql("SELECT ST_GeoHashNeighbor('u1pb', 
'NE')").first().getString(0)
+    assert(result == "u301")

Review Comment:
   The expected `NE` neighbor value here is inconsistent with the docs example 
added in this PR (which shows `u1pd` as the NE neighbor in the ordered 
neighbors array) and with the Flink test expecting `u1pd` for `NE`. Align the 
expected values across Spark/Flink tests and the docs to the actual 
implementation output, so cross-engine behavior and documentation stay 
consistent.
   ```suggestion
       assert(result == "u1pd")
   ```



##########
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/InferredExpression.scala:
##########
@@ -263,6 +271,12 @@ object InferredTypes {
       } else {
         null
       }
+    } else if (t =:= typeOf[Array[String]]) { output =>
+      if (output != null) {
+        
ArrayData.toArrayData(output.asInstanceOf[Array[String]].map(UTF8String.fromString))

Review Comment:
   This serialization path will throw if the `Array[String]` contains null 
elements (`UTF8String.fromString(null)` will NPE). Map nulls through (e.g., map 
to `null` UTF8String entries) before calling `ArrayData.toArrayData`.
   ```suggestion
           ArrayData.toArrayData(
             output.asInstanceOf[Array[String]].map { s =>
               if (s != null) UTF8String.fromString(s) else null
             })
   ```



##########
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/InferredExpression.scala:
##########
@@ -220,6 +222,12 @@ object InferredTypes {
       expr.asString(input)
     } else if (t =:= typeOf[Array[Long]]) { expr => input =>
       expr.eval(input).asInstanceOf[ArrayData].toLongArray()
+    } else if (t =:= typeOf[Array[String]]) { expr => input =>
+      expr.eval(input).asInstanceOf[ArrayData] match {
+        case null => null
+        case arrayData: ArrayData =>
+          (0 until arrayData.numElements()).map(i => 
arrayData.getUTF8String(i).toString).toArray

Review Comment:
   This conversion will throw if the Spark `ArrayData` contains null elements 
(`getUTF8String(i)` can return null / `isNullAt(i)` true). Handle null slots by 
checking `arrayData.isNullAt(i)` and emitting `null` in the resulting 
`Array[String]` for those positions.
   ```suggestion
             (0 until arrayData.numElements()).map { i =>
               if (arrayData.isNullAt(i)) {
                 null
               } else {
                 val utf8 = arrayData.getUTF8String(i)
                 if (utf8 == null) null else utf8.toString
               }
             }.toArray
   ```



##########
snowflake-tester/src/test/java/org/apache/sedona/snowflake/snowsql/TestFunctions.java:
##########
@@ -471,6 +471,19 @@ public void test_ST_GeoHash() {
         "u3r0p");
   }
 
+  @Test
+  public void test_ST_GeoHashNeighbors() {
+    registerUDF("ST_GeoHashNeighbors", String.class);
+    verifySqlSingleRes("select sedona.ST_GeoHashNeighbor('u1pb', 'n')", 
"u1pc");

Review Comment:
   Same issue as V2: the test registers `ST_GeoHashNeighbors` but calls 
`ST_GeoHashNeighbor`, so the neighbors function is untested. Change the SQL to 
`sedona.ST_GeoHashNeighbors('u1pb')` and assert the expected array result.
   ```suggestion
       verifySqlSingleRes("select 
ARRAY_SIZE(sedona.ST_GeoHashNeighbors('u1pb'))", 8);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to