Thanks for the your response! I actually turned off that feature using the flag you mentioned and it worked perfectly. To give you some context on why I'm looking into this: last year I was running some streaming jobs on Spark 3.5 using a 16GB standalone Databricks cluster (trying to keep costs down!). I noticed over 500 threads spawning just because of shuffle.
I’ve always heard the advice that unless you’re hitting a specific issue, it’s often better to stick to default configurations—kind of like the common wisdom for JVM parameters. So, we kept the default 200 shuffle partitions, even though that’s usually overkill for many streaming workloads. I even saw this in our unit tests; the execution time dropped from 14 minutes to 7 minutes just by decreasing the partitions to 1. Since tests don’t handle much data, the overhead was just killing the performance. I really think lowering the default shuffle partitions for streaming jobs would be beneficial for most use cases in general. But now, with this new checkpointing feature, it feels even more necessary to avoid that resource bloat! I can be perfectly wrong though. What do you all think? El mar, 3 feb 2026 a las 10:12, B. Micheal Okutubo via dev (< [email protected]>) escribió: > Hi Ángel, > > Yeah the ChecksumCheckpointFileManager creates background threads to > parallelize file uploads from threads using the state store. This only > applies to stateful queries (i.e. using state store). You can turn it off > by setting `spark.sql.streaming.checkpoint.fileChecksum.enabled` to `false` > and see how your experiment behaves. > > Also, for Stateful queries, for each partition, Spark creates a state > store instance. If you're using the default store in Spark, then it is an > in-memory map. As you increase the number of partitions, the number of > store instances increases as well, hence node resource usage increases. > > In reality, you typically won't be running that amount of partitions on a > single node and will likely set it based on the number of cores you have > available. Or do you have a specific use case that requires 1000 state > partitions on a single node? > > But yeah, a lower default might be useful. > > On Mon, Feb 2, 2026 at 10:10 PM Ángel Álvarez Pascua < > [email protected]> wrote: > >> Hi, >> >> While running some tests for an article last weekend, I came across the >> new ChecksumCheckpointFileManager feature in Spark 4.1. >> >> This feature spawns *4 threads per output partition* in streaming jobs. >> Since streaming jobs also use the *default 200 shuffle partitions*, this >> could pose a significant resource risk. In fact, I tested it by increasing >> spark.sql.shuffle.partitions to 1000 in a simple streaming job and ran >> into an *OOM error*—likely due to the creation of 4,000 extra threads (4 >> threads × 1000 partitions). >> >> I’ve opened *SPARK-55311 >> <https://issues.apache.org/jira/browse/SPARK-55311>* regarding this. My >> suggestion is that, similar to how *AQE is disabled in streaming jobs*, >> we might consider *defaulting to a lower spark.sql.shuffle.partitions >> value* (e.g., 25) for streaming workloads. >> >> I’d love to hear your thoughts on this. >> >> Regards, >> Ángel Álvarez >> >
