Subham Singhal created SPARK-49259:
--------------------------------------
Summary: Size based partition creation during kafka read
Key: SPARK-49259
URL: https://issues.apache.org/jira/browse/SPARK-49259
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Subham Singhal
Currently Spark + kafka structured streaming provides *minPartitions* config to
create more number of partitions than kafka has. This is helpful to increase
parallelism but this value is can not be changed dynamically.
It would be better to dynamically increase spark partitions based on input
size, if input size is high create more partitions. We can take *avg msg size*
and *maxBytesPerPartition* as input and dynamically create partitions to handle
varying loads.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]