Jitmisra opened a new pull request, #280:
URL: https://github.com/apache/kvrocks-controller/pull/280
In the current implementation of parallelProbeNodes(), a new goroutine is
spawned for every node in the cluster without any limit on concurrent
operations. This can lead to resource exhaustion in large deployments with many
nodes.
`func (c *ClusterChecker) parallelProbeNodes(ctx context.Context, cluster
*store.Cluster) {
for i, shard := range cluster.Shards {
for _, node := range shard.Nodes {
go func(shardIdx int, n store.Node) {
// Node probing code...
}(i, node)
}
}
// No waiting mechanism or concurrency control
}`
Problems
Resource Exhaustion: In large clusters (hundreds or thousands of nodes),
this creates too many goroutines at once
Network Floods: All probes happen simultaneously, potentially overwhelming
network connections
No Completion Guarantee: The function returns immediately without ensuring
all probes complete
System Resource Strain: Can exhaust system resources like file descriptors
for network connections
Solution
This PR implements a semaphore pattern to limit the number of concurrent
operations while still maintaining parallelism:
This change makes the system more resilient and scalable for production
deployments of all sizes.
This PR implements a semaphore pattern to limit the number of concurrent
operations while still maintaining parallelism:
`func (c *ClusterChecker) parallelProbeNodes(ctx context.Context, cluster
*store.Cluster) {
// Limit concurrent operations to a reasonable number
semaphore := make(chan struct{}, 20) // Configurable limit
var wg sync.WaitGroup
for i, shard := range cluster.Shards {
for _, node := range shard.Nodes {
wg.Add(1)
go func(shardIdx int, n store.Node) {
defer wg.Done()
// Acquire semaphore before proceeding
semaphore <- struct{}{}
defer func() { <-semaphore }()
// Existing node probing code...
}(i, node)
}
}
// Wait for all probes to complete
wg.Wait()
}`
This change makes the system more resilient and scalable for production
deployments of all sizes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]