Copilot commented on code in PR #175:
URL: https://github.com/apache/sedona-db/pull/175#discussion_r2408615398
##########
rust/sedona-geoparquet/src/writer.rs:
##########
@@ -50,26 +66,33 @@ pub fn create_geoparquet_writer_physical_plan(
}
// If there is no geometry, just use the inner implementation
- let output_geometry_column_indices =
conf.output_schema().geometry_column_indices()?;
+ let mut output_geometry_column_indices =
conf.output_schema().geometry_column_indices()?;
if output_geometry_column_indices.is_empty() {
return create_inner_writer(input, conf, order_requirements,
options.inner.clone());
}
// We have geometry and/or geography! Collect the GeoParquetMetadata we'll
need to write
let mut metadata = GeoParquetMetadata::default();
+ let mut bbox_colunns = HashMap::new();
Review Comment:
Typo in variable name bbox_colunns; should be bbox_columns. Please rename
here and all subsequent references (e.g., lines 85, 145) for clarity.
##########
rust/sedona-geoparquet/src/options.rs:
##########
@@ -73,3 +82,19 @@ impl Default for GeoParquetVersion {
Self::V1_0
}
}
+
+impl FromStr for GeoParquetVersion {
+ type Err = DataFusionError;
+
+ fn from_str(s: &str) -> Result<Self, Self::Err> {
+ match s.to_lowercase().as_str() {
+ "1.0" => Ok(GeoParquetVersion::V1_0),
+ "1.1" => Ok(GeoParquetVersion::V1_1),
+ "2.0" => Ok(GeoParquetVersion::V2_0),
+ "none" => Ok(GeoParquetVersion::Omitted),
+ _ => plan_err!(
+ "Unexpected GeoParquet version string (expected '1.0', '1.1',
'2.0', or 'none'"
Review Comment:
Error message is missing a closing parenthesis. Add ) at the end to improve
clarity.
```suggestion
"Unexpected GeoParquet version string (expected '1.0',
'1.1', '2.0', or 'none')"
```
##########
python/sedonadb/python/sedonadb/dataframe.py:
##########
@@ -313,6 +314,19 @@ def to_parquet(
file vs. writing one file per partition to a directory. By
default,
a single file is written if `partition_by` is unspecified and
`path` ends with `.parquet`.
+ geoparquet_version: GeoParquet metadata version to write if output
contains
+ one or more geometry columns. The default (1.0) is the most
widely
+ supported and will result in geometry columns being recognized
in many
+ readers; however, only includes statistics at the file level.
+
+ Use GeoParquet 1.1 to compute an additional bounding box column
+ for every geometry column in the output: some readers can use
these columns
+ to prune row groups when files contain an effective spatial
ordering.
+ The extra columns will be appear just before their geometry
column and
Review Comment:
Grammatical issue: 'will be appear' should be 'will appear'.
```suggestion
The extra columns will appear just before their geometry
column and
```
##########
rust/sedona-geoparquet/src/writer.rs:
##########
@@ -262,4 +540,241 @@ mod test {
test_dataframe_roundtrip(ctx, df).await;
}
+
+ #[tokio::test]
+ async fn geoparquet_1_1_basic() {
+ let example = test_geoparquet("example", "geometry").unwrap();
+ let ctx = setup_context();
+ let df = ctx
+ .table(&example)
+ .await
+ .unwrap()
+ // DataFusion internals loose the nullability we assigned to the
bbox
Review Comment:
Word 'loose' should be 'lose'. Also update the identical occurrences on
lines 587, 635, and 672.
```suggestion
// DataFusion internals lose the nullability we assigned to the
bbox
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]