Mit Desai created YUNIKORN-3118: ----------------------------------- Summary: Implement Parallel TryNode Evaluation for Improved Scheduling Performance Key: YUNIKORN-3118 URL: https://issues.apache.org/jira/browse/YUNIKORN-3118 Project: Apache YuniKorn Issue Type: Improvement Components: core - scheduler Reporter: Mit Desai Assignee: Mit Desai
h3. Summary Implement parallel evaluation of nodes during the scheduling process to significantly improve scheduling latency in large clusters. This enhancement introduces configurable parallelization of the TryNode evaluation process while maintaining backward compatibility. h3. Background In large Kubernetes clusters with many nodes, the current sequential node evaluation process can become a bottleneck during scheduling. Each allocation request must evaluate nodes one by one, leading to increased scheduling latency, especially when dealing with multiple pending pods. h3. Proposed Solution Add a new configuration parameter `trynodesthreadcount` that allows us to configure the number of parallel threads used for node evaluation during scheduling. h3. Key Features: 1. {*}Configurable Parallelism{*}: New `trynodesthreadcount` parameter in partition configuration 2. {*}Backward Compatibility{*}: Defaults to sequential behavior (value = 1) when not configured 3. {*}Thread Safety{*}: Proper synchronization using goroutines and semaphores 4. {*}Performance Optimization{*}: Implements dry-run evaluation before actual allocation attempts Configuration Example: {code:yaml} partitions: name: default trynodesthreadcount: 20 # Enable parallel evaluation with 20 threads queues: name: root {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org