dbnl-tjs opened a new issue, #3040:
URL: https://github.com/apache/iceberg-python/issues/3040

   ### Apache Iceberg version
   
   0.10.0 (latest release)
   
   ### Please describe the bug 🐞
   
   Projected reads on a partitioned table can fail with:
   
   `ValueError: Could not find field with id: 2`
   
   This occurs when scanning with `selected_fields` that exclude the partition 
source column (for example projecting only `field1` while the table is 
partitioned by `day(timestamp)`).
   
   ## Expected behavior
   
   A projected read should succeed even if the projection does not include 
partition source columns.
   
   ## Actual behavior
   
   `table.scan(..., selected_fields=(...)).to_arrow()` fails in PyIceberg 
planning/execution with:
   
   ```
   ValueError: Could not find field with id: 2
   ```
   
   ## Environment
   
   - Python: 3.14.2
   - pyiceberg: 0.10.0
   - pyiceberg-core: 0.6.0
   - backend: SQL catalog
   - IO impl: `pyiceberg.io.pyarrow.PyArrowFileIO`
   
   ## Repro
   
   ```python
   from datetime import UTC, datetime
   from pathlib import Path
   import tempfile
   
   import pyarrow as pa
   from pyiceberg.catalog import load_catalog
   from pyiceberg.expressions import AlwaysTrue
   from pyiceberg.partitioning import PartitionField, PartitionSpec
   from pyiceberg.schema import Schema
   from pyiceberg.transforms import DayTransform
   from pyiceberg.types import NestedField, StringType, TimestamptzType
   
   with tempfile.TemporaryDirectory(prefix="pyiceberg-repro-") as tmp:
       tmp_path = Path(tmp)
       db_path = tmp_path / "catalog.db"
       warehouse_path = tmp_path / "warehouse"
       warehouse_path.mkdir(parents=True, exist_ok=True)
   
       catalog = load_catalog(
           "default",
           **{
               "type": "sql",
               "uri": f"sqlite:///{db_path}",
               "warehouse": warehouse_path.as_uri(),
               "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
           },
       )
   
       catalog.create_namespace_if_not_exists("ns")
   
       schema_v1 = Schema(
           NestedField(field_id=1, name="timestamp", 
field_type=TimestamptzType(), required=True),
           NestedField(field_id=2, name="value", field_type=StringType(), 
required=False),
       )
       spec = PartitionSpec(
           PartitionField(source_id=1, field_id=1000, transform=DayTransform(), 
name="_day"),
       )
   
       table = catalog.create_table("ns.tbl", schema=schema_v1, 
partition_spec=spec)
   
       # File 1: old schema (no new_col)
       table.append(
           pa.Table.from_pylist(
               [{"timestamp": datetime(2025, 1, 1, tzinfo=UTC), "value": 
"old"}],
               schema=pa.schema(
                   [
                       pa.field("timestamp", pa.timestamp("us", tz="UTC"), 
nullable=False),
                       pa.field("value", pa.string()),
                   ]
               ),
           )
       )
   
       # Evolve schema
       with table.update_schema() as u:
           u.add_column("new_col", StringType())
   
       table = catalog.load_table("ns.tbl")
   
       # File 2: new schema (has new_col)
       table.append(
           pa.Table.from_pylist(
               [{"timestamp": datetime(2025, 1, 2, tzinfo=UTC), "value": "new", 
"new_col": "x"}],
               schema=pa.schema(
                   [
                       pa.field("timestamp", pa.timestamp("us", tz="UTC"), 
nullable=False),
                       pa.field("value", pa.string()),
                       pa.field("new_col", pa.string()),
                   ]
               ),
           )
       )
   
       # Repro on affected versions:
       print(table.scan(row_filter=AlwaysTrue(), 
selected_fields=("new_col",)).to_arrow())
   
       # Workaround:
       # print(table.scan(row_filter=AlwaysTrue(), selected_fields=("new_col", 
"timestamp")).to_arrow())
   ```
   
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to