Re: [PR] Feature: Incremental Append Scan [iceberg-python]

via GitHub Mon, 18 May 2026 08:21:26 -0700


smaheshwar-pltr commented on code in PR #3364:
URL: https://github.com/apache/iceberg-python/pull/3364#discussion_r3260034514



##########
pyiceberg/table/__init__.py:
##########
@@ -1165,6 +1165,59 @@ def scan(
             table_identifier=self._identifier,
         )
 
+    def incremental_append_scan(
+        self,
+        row_filter: str | BooleanExpression = ALWAYS_TRUE,
+        selected_fields: tuple[str, ...] = ("*",),
+        case_sensitive: bool = True,
+        from_snapshot_id_exclusive: int | None = None,
+        to_snapshot_id_inclusive: int | None = None,
+        options: Properties = EMPTY_DICT,
+        limit: int | None = None,
+    ) -> IncrementalAppendScan:
+        """Fetch an IncrementalAppendScan based on the table's current 
metadata.
+
+        The incremental append scan returns the rows added by append snapshots 
in a snapshot
+        range that match the provided row_filter, projected onto the table's 
current schema.
+
+        Args:
+            row_filter:
+                A string or BooleanExpression that describes the
+                desired rows.
+            selected_fields:
+                A tuple of strings representing the column names
+                to return in the output dataframe.
+            case_sensitive:
+                If True column matching is case sensitive.
+            from_snapshot_id_exclusive:

Review Comment:
   Requiring `from_snapshot_id_exclusive` to be non-`None` at plan time is a 
deliberate divergence from Java's 
[`IncrementalScan.fromSnapshotInclusive`](https://github.com/apache/iceberg/blob/2f6606a247e2b16be46ca6c02fc4cfc2e17691e6/api/src/main/java/org/apache/iceberg/IncrementalScan.java#L34)
 (which defaults to the oldest ancestor) and follows Spark's required 
`start-snapshot-id` 
([docs](https://iceberg.apache.org/docs/latest/spark-queries/#incremental-read)).
 Argument 
[here](https://github.com/apache/iceberg-python/pull/2031#discussion_r2102674779)
 — TL;DR an append scan only reads `append` snapshots, so "from the oldest 
ancestor" would be misleading after a `replace`.



##########
pyiceberg/table/__init__.py:
##########
@@ -1668,6 +1721,18 @@ def scan(
     ) -> DataScan:
         raise ValueError("Cannot scan a staged table")
 
+    def incremental_append_scan(

Review Comment:
   Mirrors `StagedTable.scan` two lines up — staged tables have no committed 
metadata to scan against.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Feature: Incremental Append Scan [iceberg-python]

Reply via email to