[ 
https://issues.apache.org/jira/browse/SPARK-51111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-51111:
-------------------------
    Attachment:     (was: MockKafkaJob-1.scala)

> Streaming job gets stuck during the startup of Driver for the consumers are 
> in the rebalance
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-51111
>                 URL: https://issues.apache.org/jira/browse/SPARK-51111
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 3.0.0
>            Reporter: Mars
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: KafkaJobTest.scala
>
>
> When multiple Kafka `InputDStream` are used (number exceeding the  Driver's 
> physical machine `Runtime.getRuntime.availableProcessors`), and the consumers 
> are configured with the same group id. Driver will remains stuck and log 
> 'Request joining group due to: group is already rebalancing'.
> It is more likely to occur in the k8s environment, because the cores of the 
> Driver is exactly the same as the cores of the physical machine 
> (`Runtime.getRuntime.availableProcessors`) .
> ROOT CAUSE:
> In the code,  Scala {{ParVector}} is used to initialize the streams, and this 
> variable creates a thread pool with a number of threads equal to 
> {{{}Runtime.getRuntime.availableProcessors{}}}. When the number of streams 
> exceeds the number of threads in the pool, it needs to wait for the previous 
> stream to complete before initializing the next one, leading to sequential 
> waiting.
> If the consumers of different topic InputStreams are configured with the same 
> group id, the Kafka server will trigger a rebalance operation due to the new 
> consumers joining the group. This rebalance operation can be time-consuming, 
> and when there are many streams, it can cause the initialization process to 
> hang very long time.
> We'd better to create a thread pool with a specified number of threads for 
> initialization. This can help the Driver complete the process faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to