Re: [I] Query on nested struct field with PyIceberg? [iceberg-python]

via GitHub Thu, 25 Jul 2024 06:39:27 -0700


cfrancois7 commented on issue #953:
URL: https://github.com/apache/iceberg-python/issues/953#issuecomment-2250346701


   The first issue regarding the parsing is resolved by the PR.
   But the second issue related to the pyarrow command is still there: 
`ArrowInvalid: No match for FieldRef.Name(status) in id: int32`
   
   ```python
   File 
[~/projects/mpdata/my_proj/notebooks/pyiceberg/io/pyarrow.py:1195](http://localhost:8888/notebooks/pyiceberg/io/pyarrow.py#line=1194),
 in _task_to_record_batches(fs, task, bound_row_filter, projected_schema, 
projected_field_ids, positional_deletes, case_sensitive, name_mapping)
      1192 if file_schema is None:
      1193     raise ValueError(f"Missing Iceberg schema in Metadata for file: 
{path}")
   -> 1195 fragment_scanner = ds.Scanner.from_fragment(
      1196     fragment=fragment,
      1197     # With PyArrow 16.0.0 there is an issue with casting 
record-batches:
      1198     # https://github.com/apache/arrow/issues/41884
      1199     # https://github.com/apache/arrow/issues/43183
      1200     # Would be good to remove this later on
      1201     schema=_pyarrow_schema_ensure_large_types(physical_schema),
      1202     # This will push down the query to Arrow.
      1203     # But in case there are positional deletes, we have to apply 
them first
      1204     filter=pyarrow_filter if not positional_deletes else None,
      1205     columns=[col.name for col in file_project_schema.columns],
      1206 )
      1208 current_index = 0
      1209 batches = fragment_scanner.to_batches()
   
   File 
[~/.anaconda3/envs/my_proj/lib/python3.12/site-packages/pyarrow/_dataset.pyx:3558](http://localhost:8888/home/machine_learning/.anaconda3/envs/my_proj/lib/python3.12/site-packages/pyarrow/_dataset.pyx#line=3557),
 in pyarrow._dataset.Scanner.from_fragment()
   
   File 
[~/.anaconda3/envs/my_proj/lib/python3.12/site-packages/pyarrow/_dataset.pyx:3327](http://localhost:8888/home/machine_learning/.anaconda3/envs/my_proj/lib/python3.12/site-packages/pyarrow/_dataset.pyx#line=3326),
 in pyarrow._dataset._populate_builder()
   
   File 
[~/.anaconda3/envs/my_proj/lib/python3.12/site-packages/pyarrow/_compute.pyx:2700](http://localhost:8888/home/machine_learning/.anaconda3/envs/my_proj/lib/python3.12/site-packages/pyarrow/_compute.pyx#line=2699),
 in pyarrow._compute._bind()
   
   File 
[~/.anaconda3/envs/my_proj/lib/python3.12/site-packages/pyarrow/error.pxi:154](http://localhost:8888/home/machine_learning/.anaconda3/envs/my_proj/lib/python3.12/site-packages/pyarrow/error.pxi#line=153),
 in pyarrow.lib.pyarrow_internal_check_status()
   
   File 
[~/.anaconda3/envs/my_proj/lib/python3.12/site-packages/pyarrow/error.pxi:91](http://localhost:8888/home/machine_learning/.anaconda3/envs/my_proj/lib/python3.12/site-packages/pyarrow/error.pxi#line=90),
 in pyarrow.lib.check_status()
   
   ArrowInvalid: No match for FieldRef.Name(status) in id: int32
   name: large_string
   age: int32
   address: struct<street: large_string, city: large_string, postal_code: 
large_string>
   contact: struct<email: large_string, phone: large_string>
   employment: struct<status: large_string, position: large_string, company: 
struct<name: large_string, location: large_string>>
   preferences: struct<newsletter: bool, notifications: struct<email: bool, 
sms: bool>>
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Query on nested struct field with PyIceberg? [iceberg-python]

Reply via email to