Copilot commented on code in PR #2040:
URL: https://github.com/apache/sedona/pull/2040#discussion_r2180807845
##########
python/sedona/geopandas/geoseries.py:
##########
@@ -1023,6 +1252,42 @@ def from_shapely(
def from_arrow(cls, arr, **kwargs) -> "GeoSeries":
raise NotImplementedError("GeoSeries.from_arrow() is not implemented
yet.")
+ @classmethod
+ def _create_from_select(
+ cls, select: str, data, schema, index, crs, **kwargs
+ ) -> "GeoSeries":
+
+ from pyspark.pandas.utils import default_session
+ from pyspark.pandas.internal import InternalField
+ import numpy as np
+
+ if isinstance(data, list) and not isinstance(data[0], (tuple, list)):
+ data = [(obj,) for obj in data]
+
+ select = f"{select} as geometry"
+
+ print(data)
+ print(select)
Review Comment:
Remove the `print(select)` debug statement to avoid unintended console
output; if insight into the SQL expression is needed, use a logger instead.
```suggestion
logger = logging.getLogger(__name__)
logger.info(data)
logger.info(select)
```
##########
python/sedona/geopandas/geoseries.py:
##########
@@ -1023,6 +1252,42 @@ def from_shapely(
def from_arrow(cls, arr, **kwargs) -> "GeoSeries":
raise NotImplementedError("GeoSeries.from_arrow() is not implemented
yet.")
+ @classmethod
+ def _create_from_select(
+ cls, select: str, data, schema, index, crs, **kwargs
+ ) -> "GeoSeries":
+
+ from pyspark.pandas.utils import default_session
+ from pyspark.pandas.internal import InternalField
+ import numpy as np
+
+ if isinstance(data, list) and not isinstance(data[0], (tuple, list)):
+ data = [(obj,) for obj in data]
+
+ select = f"{select} as geometry"
+
+ print(data)
+ print(select)
Review Comment:
The debug `print(data)` statement can clutter logs in production. Consider
removing it or replacing it with a structured logging call at an appropriate
log level.
```suggestion
logger.debug(data)
logger.debug(select)
```
##########
python/sedona/geopandas/geoseries.py:
##########
@@ -132,6 +129,20 @@ def __init__(
"allow_override=True)' to overwrite CRS or "
"'GeoSeries.to_crs(crs)' to reproject geometries. "
)
+ # This is a temporary workaround since pyspark errors when
creating a ps.Series from a ps.Series
+ # This is NOT a scalable solution since we call to_pandas() on the
data and is a hacky solution
+ # but this should be resolved if/once
https://github.com/apache/spark/pull/51300 is merged in.
+ # For now, we reset self._anchor = data to have keep the geometry
information (e.g crs) that's lost in to_pandas()
+ super().__init__(
+ data=data.to_pandas(),
+ index=index,
+ dtype=dtype,
+ name=name,
+ copy=copy,
+ fastpath=fastpath,
+ )
+
+ self._anchor = data
Review Comment:
After the workaround, `self._col_label` is not restored. This may break
geometry column labeling. Please reassign `self._col_label = index` (or the
appropriate label) immediately after `self._anchor = data`.
```suggestion
self._anchor = data
self._col_label = index
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]