[jira] [Updated] (SPARK-29799) Split a kafka partition into multiple KafkaRDD partitions in the kafka external plugin for Spark Streaming

zengrui (Jira) Sat, 09 Nov 2019 19:32:57 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-29799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


zengrui updated SPARK-29799:
----------------------------
    Attachment: 0001-add-implementation-for-issue-SPARK-29799.patch

> Split a kafka partition into multiple KafkaRDD partitions in the kafka 
> external plugin for Spark Streaming
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-29799
>                 URL: https://issues.apache.org/jira/browse/SPARK-29799
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.1.0, 2.4.3
>            Reporter: zengrui
>            Priority: Major
>         Attachments: 0001-add-implementation-for-issue-SPARK-29799.patch
>
>
> When we use Spark Streaming to consume records from kafka, the generated 
> KafkaRDD‘s partition number is equal to kafka topic's partition number, so we 
> can not use more cpu cores to execute the streaming task except we change the 
> topic's partition number，but we can not increase the topic's partition number 
> infinitely.
> Now I think we can split a kafka partition into multiple KafkaRDD partitions, 
> and we can config
> it, then we can use more cpu cores to execute the streaming task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-29799) Split a kafka partition into multiple KafkaRDD partitions in the kafka external plugin for Spark Streaming

Reply via email to