Quick correction: we added the feature to "enable" AQE in "stateless" streaming queries in 4.1. I know that's not relevant to the main point but just wanted to let you know.
On Tue, Feb 3, 2026 at 3:09 PM Ángel Álvarez Pascua < [email protected]> wrote: > Hi, > > While running some tests for an article last weekend, I came across the > new ChecksumCheckpointFileManager feature in Spark 4.1. > > This feature spawns *4 threads per output partition* in streaming jobs. > Since streaming jobs also use the *default 200 shuffle partitions*, this > could pose a significant resource risk. In fact, I tested it by increasing > spark.sql.shuffle.partitions to 1000 in a simple streaming job and ran > into an *OOM error*—likely due to the creation of 4,000 extra threads (4 > threads × 1000 partitions). > > I’ve opened *SPARK-55311 > <https://issues.apache.org/jira/browse/SPARK-55311>* regarding this. My > suggestion is that, similar to how *AQE is disabled in streaming jobs*, > we might consider *defaulting to a lower spark.sql.shuffle.partitions > value* (e.g., 25) for streaming workloads. > > I’d love to hear your thoughts on this. > > Regards, > Ángel Álvarez >
