Re: SPARK-55311 - ChecksumCheckpointFileManager spawns 800 threads by default in streaming jobs

Ángel Álvarez Pascua Tue, 03 Feb 2026 02:14:53 -0800

Thanks for the your response!

I actually turned off that feature using the flag you mentioned and it
worked perfectly. To give you some context on why I'm looking into this:
last year I was running some streaming jobs on Spark 3.5 using a 16GB
standalone Databricks cluster (trying to keep costs down!). I noticed over
500 threads spawning just because of shuffle.

I’ve always heard the advice that unless you’re hitting a specific issue,
it’s often better to stick to default configurations—kind of like the
common wisdom for JVM parameters. So, we kept the default 200 shuffle
partitions, even though that’s usually overkill for many streaming
workloads.

I even saw this in our unit tests; the execution time dropped from 14
minutes to 7 minutes just by decreasing the partitions to 1. Since tests
don’t handle much data, the overhead was just killing the performance.

I really think lowering the default shuffle partitions for streaming jobs
would be beneficial for most use cases in general. But now, with this new
checkpointing feature, it feels even more necessary to avoid that resource
bloat!

I can be perfectly wrong though. What do you all think?

El mar, 3 feb 2026 a las 10:12, B. Micheal Okutubo via dev (<
[email protected]>) escribió:

> Hi Ángel,
>
> Yeah the ChecksumCheckpointFileManager creates background threads to
> parallelize file uploads from threads using the state store. This only
> applies to stateful queries (i.e. using state store). You can turn it off
> by setting `spark.sql.streaming.checkpoint.fileChecksum.enabled` to `false`
> and see how your experiment behaves.
>
> Also, for Stateful queries, for each partition, Spark creates a state
> store instance. If you're using the default store in Spark, then it is an
> in-memory map. As you increase the number of partitions, the number of
> store instances increases as well, hence node resource usage increases.
>
> In reality, you typically won't be running that amount of partitions on a
> single node and will likely set it based on the number of cores you have
> available. Or do you have a specific use case that requires 1000 state
> partitions on a single node?
>
> But yeah, a lower default might be useful.
>
> On Mon, Feb 2, 2026 at 10:10 PM Ángel Álvarez Pascua <
> [email protected]> wrote:
>
>> Hi,
>>
>> While running some tests for an article last weekend, I came across the
>> new ChecksumCheckpointFileManager feature in Spark 4.1.
>>
>> This feature spawns *4 threads per output partition* in streaming jobs.
>> Since streaming jobs also use the *default 200 shuffle partitions*, this
>> could pose a significant resource risk. In fact, I tested it by increasing
>> spark.sql.shuffle.partitions to 1000 in a simple streaming job and ran
>> into an *OOM error*—likely due to the creation of 4,000 extra threads (4
>> threads × 1000 partitions).
>>
>> I’ve opened *SPARK-55311
>> <https://issues.apache.org/jira/browse/SPARK-55311>* regarding this. My
>> suggestion is that, similar to how *AQE is disabled in streaming jobs*,
>> we might consider *defaulting to a lower spark.sql.shuffle.partitions
>> value* (e.g., 25) for streaming workloads.
>>
>> I’d love to hear your thoughts on this.
>>
>> Regards,
>> Ángel Álvarez
>>
>

Re: SPARK-55311 - ChecksumCheckpointFileManager spawns 800 threads by default in streaming jobs

Reply via email to