nateab opened a new pull request, #27516:
URL: https://github.com/apache/flink/pull/27516

   ## What is the purpose of the change
   
     This pull request adds a secondary HashMap index 
(taskExecutorsByInstanceId) to the ResourceManager for O(1) lookups of 
WorkerRegistration by InstanceID. Previously, getWorkerByInstanceId() performed 
an O(n)
     linear scan through all registered TaskExecutors, which could become a 
performance bottleneck in clusters with many TaskExecutors. This addresses the 
TODO comment in the existing code: "Improve performance by
     having an index on the instanceId".
   
   ##  Brief change log
   
     - Added taskExecutorsByInstanceId HashMap field to ResourceManager for 
fast InstanceID lookups
     - Initialize the index in the ResourceManager constructor
     - Maintain index consistency by updating it alongside the primary 
taskExecutors map:
       - Add entry when TaskExecutor registers
       - Remove old entry when TaskExecutor re-registers (replacement)
       - Remove entry when TaskExecutor connection is closed
       - Clear index when ResourceManager state is cleared
     - Replaced O(n) loop in getWorkerByInstanceId() with O(1) HashMap lookup
   
   ##  Verifying this change
   
     This change is already covered by existing tests, such as:
   
     - ResourceManagerTaskExecutorTest (6 tests) - covers TaskExecutor 
registration, re-registration, and disconnection scenarios
     - ResourceManagerTest (14 tests) - covers general ResourceManager 
functionality
     - ActiveResourceManagerTest (18 tests) - covers the releaseResource() path 
which is the primary caller of getWorkerByInstanceId()
   
     All 75 ResourceManager-related tests pass with the changes.
   
   ##  Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
@Public(Evolving): no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes - ResourceManager 
is involved in TaskExecutor registration and resource management,
     but this is an internal optimization that does not change behavior
     - The S3 file system connector: no
   
   ##  Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to