Re: [PR] [GH-2227] Add Geopandas Developer Guide README.md [sedona]

via GitHub Wed, 06 Aug 2025 00:28:19 -0700


Copilot commented on code in PR #2229:
URL: https://github.com/apache/sedona/pull/2229#discussion_r2256129052



##########
docs/community/geopandas.md:
##########
@@ -0,0 +1,69 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# Geopandas on Sedona
+
+This folder contains the source code for the GeoPandas component of Apache 
Sedona. This README.md will outline a few important considerations when 
contributing changes to this component as a developer. Again **this README.md 
is targeted towards contributors**, the official documentation is more tailored 
towards users.
+
+**General Approach**: This component is built on top of the PySpark Pandas 
API. The `GeoDataFrame` and `GeoSeries` classes both inherit from pyspark 
pandas' `ps.DataFrame` and `ps.Series` classes, respectively. When possible, it 
is generally preferred to leverage pyspark pandas' implementation of a function 
and extending it from there (i.e. find a way to leverage `super()` rather than 
copying over parts of logic). The code structure resembles the structure of the 
[Geopandas repo](https://github.com/geopandas/geopandas).
+
+**Lazy Evaluation**: Spark uses lazy evaluation. Spark's distributed and lazy 
nature occasionally comes in the way of implementing functionality in the same 
way the original GeoPandas library does so. For example, GeoPandas has many 
checks for invalid crs in many places (e.g `GeoSeries.__init__()`, 
`set_crs()`). Sedona's implementation for getting the `crs` currently is 
expensive compared to GeoPandas because it requires us to run an eager 
`ST_SRID()` query. If we eagerly query for the crs in every initialization of 
`GeoSeries`, all of our functions calls (e.g `.area()`, etc) would also become 
eager and would incur a noticeable slowdown, resulting a poor user experience.

Review Comment:
   Grammar error: 'resulting a poor user experience' should be 'resulting in a 
poor user experience'.
   ```suggestion
   **Lazy Evaluation**: Spark uses lazy evaluation. Spark's distributed and 
lazy nature occasionally comes in the way of implementing functionality in the 
same way the original GeoPandas library does so. For example, GeoPandas has 
many checks for invalid crs in many places (e.g `GeoSeries.__init__()`, 
`set_crs()`). Sedona's implementation for getting the `crs` currently is 
expensive compared to GeoPandas because it requires us to run an eager 
`ST_SRID()` query. If we eagerly query for the crs in every initialization of 
`GeoSeries`, all of our function calls (e.g `.area()`, etc) would also become 
eager and would incur a noticeable slowdown, resulting in a poor user 
experience.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [GH-2227] Add Geopandas Developer Guide README.md [sedona]

Reply via email to