Copilot commented on code in PR #632:
URL: https://github.com/apache/sedona-db/pull/632#discussion_r2819248762


##########
python/sedonadb/python/sedonadb/dataframe.py:
##########
@@ -416,6 +417,91 @@ def to_parquet(
             overwrite_bbox_columns,
         )
 
+    def to_pyogrio(
+        self,
+        path: Union[str, Path, io.BytesIO],
+        *,
+        driver: Optional[str] = None,
+        geometry_type: Optional[str] = None,
+        geometry_name: Optional[str] = None,
+        crs: Optional[str] = None,
+        append: bool = False,
+        **kwargs,
+    ):
+        """Write using GDAL/OGR via pyogrio
+
+        Writes this DataFrame batchwise to a file using GDAL/OGR using the
+        implementation provided by the pyogrio package. This is the same 
backend
+        used by GeoPandas and this function is a light wrapper around
+        `pyogrio.raw.write_arrow()` that fills in default values using
+        information available to the DataFrame (e.g., geometry column and CRS).
+
+        Args:
+            path: An output path or `BytesIO` output buffer.
+            driver: An explicit GDAL OGR driver. Usually inferred from `path` 
but
+                must be provided if path is a `BytesIO`. Not all drivers 
support
+                writing to `BytesIO`.
+            geometry_type: A GeoJSON-style geometry type or `None` to provide 
an
+                inferred default value (which may be `"Unknown"`). This is 
required
+                to write some types of output (e.g. Shapefiles) and may provide
+                files that are more efficiently read.
+            geometry_name: The column to write as the primary geometry column. 
If
+                `None`, the name of the geometry column will be inferred.
+            crs: An optional string overriding the CRS of `geometry_name`.
+            append: Use `True` to append to the file for drivers that support
+                appending.
+            kwargs: Extra arguments passed to `pyogrio.raw.write_arrow()`.
+
+        Examples:
+
+            >>> import tempfile
+            >>> sd = sedona.db.connect()
+            >>> td = tempfile.TemporaryDirectory()
+            >>> sd.sql("SELECT ST_Point(0, 1, 
3857)").to_pyogrio(f"{td.name}/tmp.fgb")
+            >>> sd.read_pyogrio(f"{td.name}/tmp.fgb").show()
+            ┌──────────────┐
+            │ wkb_geometry │
+            │   geometry   │
+            ╞══════════════╡
+            │ POINT(0 1)   │
+            └──────────────┘
+        """
+        if geometry_name is None:
+            geometry_name = self._impl.primary_geometry_column()
+
+        if crs is None:
+            inferred_crs = self.schema.field(geometry_name).type.crs
+            crs = None if inferred_crs is None else inferred_crs.to_json()

Review Comment:
   If the DataFrame has no geometry columns, `primary_geometry_column()` 
appears to return a falsy value (see `to_pandas()` which checks `if 
geometry:`). In that case `self.schema.field(geometry_name)` will raise a 
confusing exception. Consider validating `geometry_name` after inference and 
raising a clear error (e.g., require `geometry_name` when there is no geometry 
column, or error out that `to_pyogrio()` requires a geometry column).



##########
python/sedonadb/python/sedonadb/dataframe.py:
##########
@@ -416,6 +417,91 @@ def to_parquet(
             overwrite_bbox_columns,
         )
 
+    def to_pyogrio(
+        self,
+        path: Union[str, Path, io.BytesIO],
+        *,
+        driver: Optional[str] = None,
+        geometry_type: Optional[str] = None,
+        geometry_name: Optional[str] = None,
+        crs: Optional[str] = None,
+        append: bool = False,
+        **kwargs,
+    ):
+        """Write using GDAL/OGR via pyogrio
+
+        Writes this DataFrame batchwise to a file using GDAL/OGR using the
+        implementation provided by the pyogrio package. This is the same 
backend
+        used by GeoPandas and this function is a light wrapper around
+        `pyogrio.raw.write_arrow()` that fills in default values using
+        information available to the DataFrame (e.g., geometry column and CRS).
+
+        Args:
+            path: An output path or `BytesIO` output buffer.
+            driver: An explicit GDAL OGR driver. Usually inferred from `path` 
but
+                must be provided if path is a `BytesIO`. Not all drivers 
support
+                writing to `BytesIO`.
+            geometry_type: A GeoJSON-style geometry type or `None` to provide 
an
+                inferred default value (which may be `"Unknown"`). This is 
required
+                to write some types of output (e.g. Shapefiles) and may provide
+                files that are more efficiently read.
+            geometry_name: The column to write as the primary geometry column. 
If
+                `None`, the name of the geometry column will be inferred.
+            crs: An optional string overriding the CRS of `geometry_name`.
+            append: Use `True` to append to the file for drivers that support
+                appending.
+            kwargs: Extra arguments passed to `pyogrio.raw.write_arrow()`.
+
+        Examples:
+
+            >>> import tempfile
+            >>> sd = sedona.db.connect()
+            >>> td = tempfile.TemporaryDirectory()
+            >>> sd.sql("SELECT ST_Point(0, 1, 
3857)").to_pyogrio(f"{td.name}/tmp.fgb")
+            >>> sd.read_pyogrio(f"{td.name}/tmp.fgb").show()
+            ┌──────────────┐
+            │ wkb_geometry │
+            │   geometry   │
+            ╞══════════════╡
+            │ POINT(0 1)   │
+            └──────────────┘
+        """
+        if geometry_name is None:
+            geometry_name = self._impl.primary_geometry_column()
+
+        if crs is None:
+            inferred_crs = self.schema.field(geometry_name).type.crs
+            crs = None if inferred_crs is None else inferred_crs.to_json()
+
+        if geometry_type is None:
+            # This is required for pyogrio.raw.write_arrow(). We could try 
harder
+            # to infer this because some drivers need this information.
+            geometry_type = "Unknown"
+
+        if isinstance(path, Path):
+            path = str(path)
+
+        # There may be more endings worth special-casing here but zipped 
FlatGeoBuf
+        # is particularly useful and isn't automatically recognized
+        if driver is None and path.endswith(".fgb.zip"):

Review Comment:
   `path` can be an `io.BytesIO` (per the type hints/docs), but this code 
unconditionally calls `path.endswith(".fgb.zip")`. That will raise 
`AttributeError` for non-string paths. Consider guarding the suffix check with 
`isinstance(path, str)` (or coercing `PathLike` only), and explicitly raising a 
`ValueError` when `path` is a `BytesIO` and `driver` is not provided (as the 
docstring requires).
   ```suggestion
           if isinstance(path, io.BytesIO) and driver is None:
               raise ValueError("driver must be provided when path is a 
BytesIO")
   
           # There may be more endings worth special-casing here but zipped 
FlatGeoBuf
           # is particularly useful and isn't automatically recognized
           if driver is None and isinstance(path, str) and 
path.endswith(".fgb.zip"):
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to