smaheshwar-pltr opened a new pull request, #3511:
URL: https://github.com/apache/iceberg-python/pull/3511

   # Rationale for this change
   
   Split out of #3364 at the reviewers' request (to @rambleraptor's and 
@kevinjqliu's point that the architecture refactor and the new feature are 
easier to review separately). This is PR 1 of 2 — the `IncrementalAppendScan` 
feature lands in a follow-up PR based on this branch.
   
   **This PR is a pure refactor with no behavioural change.** The diff looks 
large, but it is almost entirely code being moved, not changed.
   
   # Changes
   
   - Introduce `BaseScan(ABC)` as a superclass of `TableScan`, holding the 
snapshot-independent surface: `table_metadata`, `io`, `row_filter`, 
`selected_fields`, `case_sensitive`, `options`, `limit`, the chaining helpers 
(`select` / `filter` / `with_case_sensitive` / `update`), and the 
format-converter sinks (`to_pandas` / `to_polars` / `to_duckdb` / `to_ray`) 
built on an abstract `to_arrow()`.
   - `TableScan` keeps everything snapshot-specific — `snapshot_id`, `catalog`, 
`table_identifier`, `snapshot()`, the snapshot-aware `projection()`, 
`use_ref()`, and abstract `count()` — so its existing public surface is 
unchanged.
   - Extract `ManifestGroupPlanner` from `DataScan` (the `_build_*` evaluators, 
`_check_sequence_number`, and the manifest-entry / file-scan-task planning) and 
route all of `DataScan`'s local planning — `scan_plan_helper()` and 
`_plan_files_local()` — through it, so the partition / metrics / residual 
pipeline lives in exactly one place.
   
   # Back-compatibility
   
   - `to_arrow_batch_reader` stays concrete on `DataScan` and is **not** made 
abstract on `BaseScan`, so external `TableScan` subclasses that were valid 
before still instantiate.
   - `to_pandas` / `to_polars` become concrete defaults on `BaseScan`, and 
`to_duckdb` / `to_ray` move up to `BaseScan` too. This **loosens** 
`TableScan`'s abstract contract (they were abstract on `TableScan` before) 
without breaking existing subclasses.
   - All docstrings and explanatory comments are preserved.
   
   # Are these changes tested?
   
   The existing `DataScan` unit and integration tests exercise the refactored 
planning path unchanged. A regression test is added for the no-current-snapshot 
planning path, which now flows through `ManifestGroupPlanner`.
   
   # Are there any user-facing changes?
   
   No. Some previously-`DataScan`-only converters (`to_duckdb` / `to_ray`) are 
now inherited by `TableScan` as well, which is purely additive.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to