Copilot commented on code in PR #2628:
URL: https://github.com/apache/sedona/pull/2628#discussion_r2781126589


##########
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/InferredExpression.scala:
##########
@@ -220,6 +222,18 @@ object InferredTypes {
       expr.asString(input)
     } else if (t =:= typeOf[Array[Long]]) { expr => input =>
       expr.eval(input).asInstanceOf[ArrayData].toLongArray()
+    } else if (t =:= typeOf[Array[String]]) { expr => input =>
+      expr.eval(input).asInstanceOf[ArrayData] match {
+        case null => null
+        case arrayData: ArrayData =>
+          (0 until arrayData.numElements()).map { i =>
+            if (arrayData.isNullAt(i)) null
+            else {
+              val utf8 = arrayData.getUTF8String(i)
+              if (utf8 == null) null else utf8.toString
+            }
+          }.toArray

Review Comment:
   This conversion allocates an intermediate Scala collection via `(0 until 
...).map(...).toArray`. Since this path can be hot for expression evaluation, 
consider switching to a pre-sized `new Array[String](numElements)` plus a 
`while` loop to reduce allocations and improve throughput.
   ```suggestion
             val n = arrayData.numElements()
             val result = new Array[String](n)
             var i = 0
             while (i < n) {
               if (arrayData.isNullAt(i)) {
                 result(i) = null
               } else {
                 val utf8 = arrayData.getUTF8String(i)
                 result(i) = if (utf8 == null) null else utf8.toString
               }
               i += 1
             }
             result
   ```



##########
common/src/main/java/org/apache/sedona/common/utils/GeoHashUtils.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.sedona.common.utils;
+
+import java.util.Arrays;
+
+/**
+ * Shared constants and utilities for geohash encoding/decoding.
+ *
+ * <p>Centralizes the base32 alphabet and decode lookup table previously 
duplicated across {@link
+ * GeoHashDecoder}, {@link PointGeoHashEncoder}, and {@link GeoHashNeighbor}.
+ */
+public class GeoHashUtils {
+
+  /** The base32 alphabet used by geohash encoding (Gustavo Niemeyer's 
specification). */
+  public static final String BASE32 = "0123456789bcdefghjkmnpqrstuvwxyz";
+
+  /** Base32 character array for index-based lookup. */
+  public static final char[] BASE32_CHARS = BASE32.toCharArray();
+
+  /** Bit masks for extracting 5-bit groups: {16, 8, 4, 2, 1}. */
+  public static final int[] BITS = new int[] {16, 8, 4, 2, 1};
+
+  /** Number of bits per base32 character. */
+  public static final int BITS_PER_CHAR = 5;
+
+  /**
+   * Reverse lookup array: maps a character (as int) to its base32 index. 
Invalid characters map to
+   * -1.
+   */
+  public static final int[] DECODE_ARRAY = new int['z' + 1];

Review Comment:
   GeoHashUtils exposes mutable arrays (`BASE32_CHARS`, `BITS`, `DECODE_ARRAY`) 
as `public static final`. Since arrays are mutable, any code in the same JVM 
can modify them at runtime and silently corrupt geohash 
encoding/decoding/neighboring results. Consider making these arrays `private` 
and exposing only safe methods (e.g., `decodeChar`) and/or returning defensive 
copies (`clone()`) where array access is required.



##########
common/src/main/java/org/apache/sedona/common/Functions.java:
##########
@@ -746,6 +746,14 @@ public static String geohash(Geometry geometry, int 
precision) {
     return GeometryGeoHashEncoder.calculate(geometry, precision);
   }
 
+  public static String[] geohashNeighbors(String geohash) {
+    return GeoHashNeighbor.getNeighbors(geohash);
+  }
+
+  public static String geohashNeighbor(String geohash, String direction) {
+    return GeoHashNeighbor.getNeighbor(geohash, direction);
+  }

Review Comment:
   These are new public `common` APIs but currently have no JavaDoc describing 
key behavioral details (e.g., neighbor ordering for `geohashNeighbors`, 
accepted direction values/case-sensitivity, and what inputs throw 
`IllegalArgumentException` vs return `null`). Adding JavaDoc here would make 
the contract clear for downstream consumers beyond SQL wrappers.



##########
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/st_functions.scala:
##########
@@ -246,6 +246,16 @@ object st_functions {
   def ST_GeoHash(geometry: String, precision: Int): Column =
     wrapExpression[ST_GeoHash](geometry, precision)
 
+  def ST_GeoHashNeighbors(geohash: Column): Column =
+    wrapExpression[ST_GeoHashNeighbors](geohash)
+  def ST_GeoHashNeighbors(geohash: String): Column =
+    wrapExpression[ST_GeoHashNeighbors](geohash)
+
+  def ST_GeoHashNeighbor(geohash: Column, direction: Column): Column =
+    wrapExpression[ST_GeoHashNeighbor](geohash, direction)

Review Comment:
   The Scala DataFrame API wrappers provide `(Column, Column)` and `(String, 
String)` overloads for `ST_GeoHashNeighbor`, but not mixed overloads like 
`(Column, String)` / `(String, Column)`. Many Sedona Spark wrappers include 
these mixed overloads to avoid forcing callers to wrap literals with 
`lit(...)`. Consider adding the mixed overloads for consistency and usability.
   ```suggestion
       wrapExpression[ST_GeoHashNeighbor](geohash, direction)
     def ST_GeoHashNeighbor(geohash: Column, direction: String): Column =
       wrapExpression[ST_GeoHashNeighbor](geohash, direction)
     def ST_GeoHashNeighbor(geohash: String, direction: Column): Column =
       wrapExpression[ST_GeoHashNeighbor](geohash, direction)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to