[
https://issues.apache.org/jira/browse/SPARK-29799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zengrui updated SPARK-29799:
----------------------------
Attachment: 0001-add-implementation-for-issue-SPARK-29799.patch
> Split a kafka partition into multiple KafkaRDD partitions in the kafka
> external plugin for Spark Streaming
> ----------------------------------------------------------------------------------------------------------
>
> Key: SPARK-29799
> URL: https://issues.apache.org/jira/browse/SPARK-29799
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 2.1.0, 2.4.3
> Reporter: zengrui
> Priority: Major
> Attachments: 0001-add-implementation-for-issue-SPARK-29799.patch
>
>
> When we use Spark Streaming to consume records from kafka, the generated
> KafkaRDD‘s partition number is equal to kafka topic's partition number, so we
> can not use more cpu cores to execute the streaming task except we change the
> topic's partition number,but we can not increase the topic's partition number
> infinitely.
> Now I think we can split a kafka partition into multiple KafkaRDD partitions,
> and we can config
> it, then we can use more cpu cores to execute the streaming task.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]