yiftizur opened a new pull request, #2628:
URL: https://github.com/apache/iceberg-python/pull/2628

   
   Fixes #$953
   
   # Rationale for this change
   Fixes filtering on nested struct fields when using PyArrow for scan 
operations.
   
   ## Are these changes tested?
   Yes, the full test suite + new tests
   
   ## Are there any user-facing changes?
   Now, filtering  a scan using a nested field will work
   
   ## Problem
   
   When filtering on nested struct fields (e.g., `parentField.childField == 
'value'`), PyArrow would fail with:
   ```
   ArrowInvalid: No match for FieldRef.Name(childField) in ...
   ```
   
   The issue occurred because PyArrow requires nested field references as 
tuples (e.g., `("parent", "child")`) rather than dotted strings (e.g., 
`"parent.child"`).
   
   ## Solution
   
   1. Modified `_ConvertToArrowExpression` to accept an optional `Schema` 
parameter
   2. Added `_get_field_name()` method that converts dotted field paths to 
tuples for nested struct fields
   3. Updated `expression_to_pyarrow()` to accept and pass the schema parameter
   4. Updated all call sites to pass the schema when available
   
   ## Changes
   
   - `pyiceberg/io/pyarrow.py`:
     - Modified `_ConvertToArrowExpression` class to handle nested field paths
     - Updated `expression_to_pyarrow()` signature to accept schema
     - Updated `_expression_to_complementary_pyarrow()` signature
   - `pyiceberg/table/__init__.py`:
     - Updated call to `_expression_to_complementary_pyarrow()` to pass schema
   - Tests:
     - Added `test_ref_binding_nested_struct_field()` for comprehensive nested 
field testing
     - Enhanced `test_nested_fields()` with issue #953 scenarios
   
   ## Example
   
   ```python
   # Now works correctly:
   table.scan(row_filter="parent.child == 'abc123'").to_polars()
   ```
   
   The fix converts the field reference from:
   - ❌ `FieldRef.Name(run_id)` (fails - field not found)
   - ✅ `FieldRef.Nested(FieldRef.Name(mazeMetadata) FieldRef.Name(run_id))` 
(works!)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to