[jira] [Created] (SPARK-49259) Size based partition creation during kafka read

Subham Singhal (Jira) Fri, 16 Aug 2024 02:05:08 -0700

Subham Singhal created SPARK-49259:
--------------------------------------

             Summary: Size based partition creation during kafka read
                 Key: SPARK-49259
                 URL: https://issues.apache.org/jira/browse/SPARK-49259
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 4.0.0
            Reporter: Subham Singhal



Currently Spark + kafka structured streaming provides *minPartitions* config to 
create more number of partitions than kafka has. This is helpful to increase 
parallelism but this value is can not be changed dynamically. 

It would be better to dynamically increase spark partitions based on input 
size, if input size is high create more partitions. We can take *avg msg size* 
and *maxBytesPerPartition* as input and dynamically create partitions to handle 
varying loads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-49259) Size based partition creation during kafka read

Reply via email to