[PR] Spark: Support long-valued streaming max rows per micro-batch [iceberg]

via GitHub Tue, 26 May 2026 12:49:09 -0700


colinre opened a new pull request, #16571:
URL: https://github.com/apache/iceberg/pull/16571


   ### Problem
   
   Spark Structured Streaming row-based micro-batch planning was effectively 
capped at `Integer.MAX_VALUE` rows. This made very large initial backfills 
impractical because streams over multi-trillion-row tables could require 
thousands of micro-batches before reaching the live tail.
   
   ### Root Cause
   
   `streaming-max-rows-per-micro-batch` was parsed and stored as an `int`, and 
planner defaults initialized the effective row limit to `Integer.MAX_VALUE` 
even when no row limit was configured.
   
   ### Change
   
   Parse and propagate the streaming row soft limit as `long`, use 
`Long.MAX_VALUE` as the unconfigured row-limit sentinel, and preserve 
complete-file soft-limit behavior. File-count rate limiting is unchanged. This 
is a Codex change; I'm generally unfamiliar with this codebase. 
   
   ### Tests
   
   Added coverage for long-valued option parsing, unconfigured 
multi-trillion-row planning, explicit long-valued soft limits, planner default 
unpacking, and existing small row-limit behavior. The structured streaming 
planner tests cover both sync and async planning through existing 
parameterization.
   
   ### Compatibility
   
   Existing option names, offsets, checkpoint compatibility, file-count limits, 
and soft-limit semantics are unchanged. Existing values at or below 
`Integer.MAX_VALUE` keep their behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Spark: Support long-valued streaming max rows per micro-batch [iceberg]

Reply via email to