Copilot commented on code in PR #2332: URL: https://github.com/apache/sedona/pull/2332#discussion_r2325822947
########## python/sedona/spark/geopandas/sindex.py: ########## @@ -170,10 +186,14 @@ def nearest( Note: Unlike Geopandas, Sedona does not support geometry input of type np.array or GeoSeries. + Note: nearest() has different behaviors depending on how the index is constructed. + When constructed from a np.array, this method returns indices like original geopandas. + When constructed from a GeoSeries or PySparkDataFrame, this method returns geometries. + Returns ------- list or tuple - List of indices of nearest geometries, optionally with distances. + List of the nearest geometries, optionally with distances. Review Comment: Similar to the query method, this docstring states it returns geometries but should acknowledge that it returns indices when constructed from np.array, based on the behavioral notes above. ```suggestion If the index was constructed from a np.array, returns a list of indices (or a tuple of (indices, distances) if return_distance is True). If constructed from a GeoSeries or PySparkDataFrame, returns a list of geometries (or a tuple of (geometries, distances) if return_distance is True). ``` ########## python/tests/geopandas/test_sindex.py: ########## @@ -63,6 +63,31 @@ def setup_method(self): ] ) + def test_construct_from_geoseries(self): + # Construct from a GeoSeries + gs = GeoSeries([Point(x, x) for x in range(5)]) + sindex = SpatialIndex(gs) + result = sindex.query(Point(2, 2)) + # SpatialIndex constructed from GeoSeries return geometries + assert result == [Point(2, 2)] + + def test_construct_from_pyspark_dataframe(self): + # Construct from PySparkDataFrame + df = self.spark.createDataFrame( + [(Point(x, x),) for x in range(5)], ["geometry"] + ) + sindex = SpatialIndex(df, column_name="geometry") + result = sindex.query(Point(2, 2)) + assert result == [Point(2, 2)] + + def test_construct_from_nparray(self): + # Construct from np.array + array = np.array([Point(x, x) for x in range(5)]) + sindex = SpatialIndex(array) + result = sindex.query(Point(2, 2)) + # Returns indices like original geopandas + assert result.tolist() == np.array([2]) Review Comment: The assertion compares a list to a numpy array, which will always be False. Use `assert result.tolist() == [2]` or `assert result == np.array([2])` instead. ```suggestion assert result.tolist() == [2] ``` ########## python/sedona/spark/geopandas/sindex.py: ########## @@ -220,20 +243,28 @@ def nearest( def intersection(self, bounds): """ - Find geometries that intersect the given bounding box. + Find geometries that intersect the given bounding box. Similar to the Geopandas version, + this is a compatibility wrapper for rtree.index.Index.intersection, use query instead. Parameters ---------- bounds : tuple Bounding box as (min_x, min_y, max_x, max_y). + Note: intersection() has different behaviors depending on how the index is constructed. + When constructed from a np.array, this method returns indices like original geopandas. + When constructed from a GeoSeries or PySparkDataFrame, this method returns geometries. + + Note: Unlike Geopandas, Sedona does not support geometry input of type np.array or GeoSeries. + It is recommended to instead use GeoSeries.intersects directly. + Returns ------- list - List of indices of matching geometries. + List of matching geometries. Review Comment: The return description should be consistent with the behavioral notes and acknowledge that indices are returned when constructed from np.array. ```suggestion List of matching indices (if constructed from a np.array) or list of matching geometries (if constructed from a GeoSeries or PySparkDataFrame). ``` ########## python/sedona/spark/geopandas/sindex.py: ########## @@ -82,12 +93,17 @@ def query(self, geometry: BaseGeometry, predicate: str = None, sort: bool = Fals sort : bool, optional, default False Whether to sort the results. + Note: query() has different behaviors depending on how the index is constructed. + When constructed from a np.array, this method returns indices like original geopandas. + When constructed from a GeoSeries or PySparkDataFrame, this method returns geometries. + Note: Unlike Geopandas, Sedona does not support geometry input of type np.array or GeoSeries. + It is recommended to instead use GeoSeries.intersects directly. Returns ------- list - List of indices of matching geometries. + List of matching geometries. Review Comment: The docstring states this returns 'List of matching geometries' but the note above indicates it returns indices when constructed from np.array. The return description should acknowledge both behaviors or be more generic. ```suggestion If the index was constructed from a np.array, returns a list of indices of matching geometries. If constructed from a GeoSeries or PySparkDataFrame, returns a list of matching geometries. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@sedona.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org