Thanks. Having more context is always relevant.

And why 4 × partitions?

El mar, 3 feb 2026, 22:13, Jungtaek Lim <[email protected]>
escribió:

> Quick correction: we added the feature to "enable" AQE in "stateless"
> streaming queries in 4.1. I know that's not relevant to the main point but
> just wanted to let you know.
>
> On Tue, Feb 3, 2026 at 3:09 PM Ángel Álvarez Pascua <
> [email protected]> wrote:
>
>> Hi,
>>
>> While running some tests for an article last weekend, I came across the
>> new ChecksumCheckpointFileManager feature in Spark 4.1.
>>
>> This feature spawns *4 threads per output partition* in streaming jobs.
>> Since streaming jobs also use the *default 200 shuffle partitions*, this
>> could pose a significant resource risk. In fact, I tested it by increasing
>> spark.sql.shuffle.partitions to 1000 in a simple streaming job and ran
>> into an *OOM error*—likely due to the creation of 4,000 extra threads (4
>> threads × 1000 partitions).
>>
>> I’ve opened *SPARK-55311
>> <https://issues.apache.org/jira/browse/SPARK-55311>* regarding this. My
>> suggestion is that, similar to how *AQE is disabled in streaming jobs*,
>> we might consider *defaulting to a lower spark.sql.shuffle.partitions
>> value* (e.g., 25) for streaming workloads.
>>
>> I’d love to hear your thoughts on this.
>>
>> Regards,
>> Ángel Álvarez
>>
>

Reply via email to