Copilot commented on code in PR #2229: URL: https://github.com/apache/sedona/pull/2229#discussion_r2256129052
########## docs/community/geopandas.md: ########## @@ -0,0 +1,69 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + --> + +# Geopandas on Sedona + +This folder contains the source code for the GeoPandas component of Apache Sedona. This README.md will outline a few important considerations when contributing changes to this component as a developer. Again **this README.md is targeted towards contributors**, the official documentation is more tailored towards users. + +**General Approach**: This component is built on top of the PySpark Pandas API. The `GeoDataFrame` and `GeoSeries` classes both inherit from pyspark pandas' `ps.DataFrame` and `ps.Series` classes, respectively. When possible, it is generally preferred to leverage pyspark pandas' implementation of a function and extending it from there (i.e. find a way to leverage `super()` rather than copying over parts of logic). The code structure resembles the structure of the [Geopandas repo](https://github.com/geopandas/geopandas). + +**Lazy Evaluation**: Spark uses lazy evaluation. Spark's distributed and lazy nature occasionally comes in the way of implementing functionality in the same way the original GeoPandas library does so. For example, GeoPandas has many checks for invalid crs in many places (e.g `GeoSeries.__init__()`, `set_crs()`). Sedona's implementation for getting the `crs` currently is expensive compared to GeoPandas because it requires us to run an eager `ST_SRID()` query. If we eagerly query for the crs in every initialization of `GeoSeries`, all of our functions calls (e.g `.area()`, etc) would also become eager and would incur a noticeable slowdown, resulting a poor user experience. Review Comment: Grammar error: 'resulting a poor user experience' should be 'resulting in a poor user experience'. ```suggestion **Lazy Evaluation**: Spark uses lazy evaluation. Spark's distributed and lazy nature occasionally comes in the way of implementing functionality in the same way the original GeoPandas library does so. For example, GeoPandas has many checks for invalid crs in many places (e.g `GeoSeries.__init__()`, `set_crs()`). Sedona's implementation for getting the `crs` currently is expensive compared to GeoPandas because it requires us to run an eager `ST_SRID()` query. If we eagerly query for the crs in every initialization of `GeoSeries`, all of our functions calls (e.g `.area()`, etc) would also become eager and would incur a noticeable slowdown, resulting in a poor user experience. ``` ########## docs/community/geopandas.md: ########## @@ -0,0 +1,69 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + --> + +# Geopandas on Sedona + +This folder contains the source code for the GeoPandas component of Apache Sedona. This README.md will outline a few important considerations when contributing changes to this component as a developer. Again **this README.md is targeted towards contributors**, the official documentation is more tailored towards users. Review Comment: The reference to 'this folder' and 'this README.md' is confusing since this is a .md file in a documentation site, not a README.md in a source code folder. Consider rephrasing to 'This guide' or 'This document' for clarity. ```suggestion This guide outlines a few important considerations when contributing changes to the GeoPandas component of Apache Sedona as a developer. Again, **this guide is targeted towards contributors**; the official documentation is more tailored towards users. ``` ########## docs/community/geopandas.md: ########## @@ -0,0 +1,69 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + --> + +# Geopandas on Sedona + +This folder contains the source code for the GeoPandas component of Apache Sedona. This README.md will outline a few important considerations when contributing changes to this component as a developer. Again **this README.md is targeted towards contributors**, the official documentation is more tailored towards users. + +**General Approach**: This component is built on top of the PySpark Pandas API. The `GeoDataFrame` and `GeoSeries` classes both inherit from pyspark pandas' `ps.DataFrame` and `ps.Series` classes, respectively. When possible, it is generally preferred to leverage pyspark pandas' implementation of a function and extending it from there (i.e. find a way to leverage `super()` rather than copying over parts of logic). The code structure resembles the structure of the [Geopandas repo](https://github.com/geopandas/geopandas). + +**Lazy Evaluation**: Spark uses lazy evaluation. Spark's distributed and lazy nature occasionally comes in the way of implementing functionality in the same way the original GeoPandas library does so. For example, GeoPandas has many checks for invalid crs in many places (e.g `GeoSeries.__init__()`, `set_crs()`). Sedona's implementation for getting the `crs` currently is expensive compared to GeoPandas because it requires us to run an eager `ST_SRID()` query. If we eagerly query for the crs in every initialization of `GeoSeries`, all of our functions calls (e.g `.area()`, etc) would also become eager and would incur a noticeable slowdown, resulting a poor user experience. Review Comment: Grammar error: 'all of our functions calls' should be 'all of our function calls' (remove 's' from 'functions'). ```suggestion **Lazy Evaluation**: Spark uses lazy evaluation. Spark's distributed and lazy nature occasionally comes in the way of implementing functionality in the same way the original GeoPandas library does so. For example, GeoPandas has many checks for invalid crs in many places (e.g `GeoSeries.__init__()`, `set_crs()`). Sedona's implementation for getting the `crs` currently is expensive compared to GeoPandas because it requires us to run an eager `ST_SRID()` query. If we eagerly query for the crs in every initialization of `GeoSeries`, all of our function calls (e.g `.area()`, etc) would also become eager and would incur a noticeable slowdown, resulting a poor user experience. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
