paleolimbot commented on code in PR #44:
URL: https://github.com/apache/sedona-db/pull/44#discussion_r2334902802


##########
python/sedonadb/python/sedonadb/context.py:
##########
@@ -120,12 +126,28 @@ def read_parquet(self, table_paths: Union[str, Path, 
Iterable[str]]) -> DataFram
             >>> sedonadb.connect().read_parquet(url)
             <sedonadb.dataframe.DataFrame object at ...>
 
+            >>> # Test with options parameter (empty options should work the 
same)
+            >>> sedonadb.connect().read_parquet(url, options={})
+            <sedonadb.dataframe.DataFrame object at ...>
+
+            >>> # Test with None options (should work the same as no options)
+            >>> sedonadb.connect().read_parquet(url, options=None)
+            <sedonadb.dataframe.DataFrame object at ...>
+
+            >>> # Test S3 options (using HTTP URL for reliable testing)
+            >>> sedonadb.connect().read_parquet(url, options={"aws.nosign": 
True})
+            <sedonadb.dataframe.DataFrame object at ...>

Review Comment:
   These examples are a bit confusing...probably just the third one makes sense 
but only if `url` is `s3://` (I don't have a plausible but tiny s3 example off 
the top of my head). We can also omit this until we find a better example 
(we'll use this example in the Python user guides/README, so there will be 
example usage somewhere).



##########
python/sedonadb/src/context.rs:
##########
@@ -74,11 +74,29 @@ impl InternalContext {
         &self,
         py: Python<'py>,
         table_paths: Vec<String>,
+        options: HashMap<String, PyObject>,
     ) -> Result<InternalDataFrame, PySedonaError> {
+        // Convert Python options dict to Rust HashMap<String, String>
+        let rust_options: HashMap<String, String> = options
+            .into_iter()
+            .map(|(k, v)| {
+                let v_str = if v.is_none(py) {
+                    String::new()
+                } else {
+                    // Convert PyObject to string
+                    match v.call_method0(py, "__str__") {
+                        Ok(str_obj) => str_obj.extract::<String>(py)?,
+                        Err(_) => v.extract::<String>(py)?,
+                    }

Review Comment:
   I think that if you make `options: HashMap<String, Option<String>>` you can 
remove this



##########
rust/sedona/src/context.rs:
##########
@@ -229,19 +229,20 @@ impl SedonaContext {
     pub async fn read_parquet<P: DataFilePaths>(
         &self,
         table_paths: P,
-        options: GeoParquetReadOptions<'_>,
+        options: HashMap<String, String>,

Review Comment:
   It would probably make more sense to keep this as `GeoParquetReadOptions` 
and push the `GeoParquetReadOptions::from_table_options()` one level up



##########
python/sedonadb/tests/test_context.py:
##########
@@ -63,7 +63,104 @@ def test_dynamic_object_stores():
     schema = pa.schema(con.read_parquet(url))
     assert schema.field("geometry").type.extension_name == "geoarrow.wkb"
 
-    # Fresh context
-    con = sedonadb.connect()
-    schema = pa.schema(con.sql(f"SELECT * FROM '{url}'"))
-    assert schema.field("geometry").type.extension_name == "geoarrow.wkb"
+
+def test_read_parquet_options_parameter(con, geoarrow_data):
+    """Test the options parameter functionality for read_parquet()"""
+    test_file = geoarrow_data / 
"quadrangles/files/quadrangles_100k_geo.parquet"
+
+    # Test 1: Backward compatibility - no options parameter
+    tab1 = con.read_parquet(test_file).to_arrow_table()
+    assert tab1["geometry"].type.extension_name == "geoarrow.wkb"
+
+    # Test 2: options=None (explicit None)
+    tab2 = con.read_parquet(test_file, options=None).to_arrow_table()
+    assert tab2["geometry"].type.extension_name == "geoarrow.wkb"
+    assert len(tab2) == len(tab1)  # Should be identical
+
+    # Test 3: Empty options dictionary
+    tab3 = con.read_parquet(test_file, options={}).to_arrow_table()
+    assert tab3["geometry"].type.extension_name == "geoarrow.wkb"
+    assert len(tab3) == len(tab1)  # Should be identical
+
+    # Test 4: Options with string values
+    tab4 = con.read_parquet(
+        test_file, options={"test.option": "value"}
+    ).to_arrow_table()
+    assert tab4["geometry"].type.extension_name == "geoarrow.wkb"
+    assert len(tab4) == len(
+        tab1
+    )  # Should be identical (option ignored but not errored)

Review Comment:
   I appreciate the comprehensiveness of the other tests, but I think this is 
probably the only test we need to maintain for this particular feature. I think 
we should use an actual s3 URL to make sure our parameters made it through, 
though. (There are some non-geometry ones in the sedona-cli tests, or we could 
use something in a wherobots public bucket if you know if a small one).



##########
rust/sedona-geoparquet/src/provider.rs:
##########
@@ -78,22 +78,38 @@ pub async fn geoparquet_listing_table(
 #[derive(Default, Clone)]
 pub struct GeoParquetReadOptions<'a> {
     inner: ParquetReadOptions<'a>,
+    table_options: Option<HashMap<String, String>>,
 }
 
 impl GeoParquetReadOptions<'_> {
     /// Create a new GeoParquetReadOptions with default values
     pub fn new() -> Self {
         Default::default()
     }
+
+    /// Create GeoParquetReadOptions from table options HashMap
+    pub fn from_table_options(options: HashMap<String, String>) -> Self {
+        GeoParquetReadOptions {
+            inner: ParquetReadOptions::default(),
+            table_options: Some(options),
+        }
+    }
 }
 
 #[async_trait]
 impl ReadOptions<'_> for GeoParquetReadOptions<'_> {
     fn to_listing_options(
         &self,
         config: &SessionConfig,
-        table_options: TableOptions,
+        mut table_options: TableOptions,
     ) -> ListingOptions {
+        // Merge custom table options if provided
+        if let Some(ref custom_options) = self.table_options {
+            for (key, value) in custom_options {
+                let _ = table_options.set(key, value);
+            }
+        }

Review Comment:
   Is there a way to reject unknown options? (Probably better for the Python 
user experience if they misspelled something)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to