[
https://issues.apache.org/jira/browse/SPARK-51111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924612#comment-17924612
]
Mars commented on SPARK-51111:
------------------------------
It can also reproduce locally (my computer has 12 cores , so I create create 13
DirectStream)
demo:
[^MockKafkaJob.scala]
> Streaming job gets stuck during the startup of Driver for the consumers are
> in the rebalance
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-51111
> URL: https://issues.apache.org/jira/browse/SPARK-51111
> Project: Spark
> Issue Type: Bug
> Components: DStreams
> Affects Versions: 3.0.0
> Reporter: Mars
> Priority: Major
> Labels: pull-request-available
> Attachments: MockKafkaJob-1.scala, MockKafkaJob.scala
>
>
> When multiple Kafka `InputDStream` are used (number exceeding the Driver's
> physical machine `Runtime.getRuntime.availableProcessors`), and the consumers
> are configured with the same group id. Driver will remains stuck and log
> 'Request joining group due to: group is already rebalancing'.
> It is more likely to occur in the k8s environment, because the cores of the
> Driver is exactly the same as the cores of the physical machine
> (`Runtime.getRuntime.availableProcessors`) .
> ROOT CAUSE:
> In the code, Scala {{ParVector}} is used to initialize the streams, and this
> variable creates a thread pool with a number of threads equal to
> {{{}Runtime.getRuntime.availableProcessors{}}}. When the number of streams
> exceeds the number of threads in the pool, it needs to wait for the previous
> stream to complete before initializing the next one, leading to sequential
> waiting.
> If the consumers of different topic InputStreams are configured with the same
> group id, the Kafka server will trigger a rebalance operation due to the new
> consumers joining the group. This rebalance operation can be time-consuming,
> and when there are many streams, it can cause the initialization process to
> hang very long time.
> We'd better to create a thread pool with a specified number of threads for
> initialization. This can help the Driver complete the process faster.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]