RjLi13 opened a new pull request, #15299:
URL: https://github.com/apache/iceberg/pull/15299

   This is part 2 after splitting PR 
https://github.com/apache/iceberg/pull/15059
   
   Part 1 PR is here: https://github.com/apache/iceberg/pull/15298.
   
   This PR focuses on only introducing the new async spark micro batch planner 
and all changes to enable it.
   
   Full context is in https://github.com/apache/iceberg/pull/15059 but posted 
below again:
   
   -------------
   Implements a new feature for Spark Structured Streaming and Iceberg users 
known as Async Spark Micro Batch Planner
   
   Currently Microbatch planning in Iceberg is synchronous. Streaming queries 
plan out what batches to read and how many rows / files in each batch. Then it 
processes the data and repeats. By introducing an async planner, it improves 
streaming performance by pre-fetching table metadata and file scan tasks in a 
background thread, reducing micro-batch planning latency. This way planning can 
overlap with data processing and speed up dealing with large volumes.
   
   This PR adds the option for users to set 
spark.sql.iceberg.async-micro-batch-planning-enabled if they want to use async 
planning. The code in SparkMicroBatchStream.java is moved to 
SyncSparkMicroBatchPlanner.java and SparkMicroBatchStream configures which 
planner to use. This option is defaulted to false, so existing behavior is 
unchanged.
   
   This feature was originally authored by Drew Goya in our Netflix fork for 
Spark 3.3 & Iceberg 1.4. I built upon Drew's work by porting this to Spark 3.5 
4.1 and current Iceberg version.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to