weiqingy opened a new pull request, #727:
URL: https://github.com/apache/flink-agents/pull/727

   Linked issue: #724
   
   ### Purpose of change
   
   Backport the resource-cache thread-safety fix to `release-0.2`.
   
   `AgentPlan#getResource` performs a non-atomic check-create-cache sequence. 
When two actions concurrently trigger the first access to the same uncached 
resource, both can invoke the provider and create duplicate instances 
(duplicate model/tool init, duplicate metric registration) before the last 
writer wins the cache.
   
   On `main` this was addressed as a side effect of #548 (f8eac10), which moved 
resource caching out of `AgentPlan` into a dedicated `ResourceCache` whose 
`getResource` is synchronized. That refactor never landed on `release-0.2`, so 
the race is still live there. Backporting the full `ResourceCache` extraction 
to a maintenance branch is heavier than warranted; the minimal equivalent is to 
guard the lazy resolution with a reentrant lock — the same mechanism `main` 
relies on.
   
   This PR applies that minimal guard on both language sides:
   
   - Java: make `AgentPlan#getResource` `synchronized`. The intrinsic lock is 
reentrant, so the nested resolution callback can still call back into 
`getResource` on the same thread.
   - Python: guard `AgentPlan.get_resource` with a per-instance reentrant lock. 
The async action thread pool runs actions on multiple threads, and 
`provider.provide` can release the GIL when it calls into Java via Pemja, so 
the same race exists. The lock is a `cached_property` (not a Pydantic private 
attribute) so it stays out of model equality.
   
   ### Tests
   
   - Java: 
`AgentPlanTest#getResourceShouldCreateOnlyOneInstanceUnderConcurrentAccess` — 
two threads race on the first access to a shared resource whose provider 
sleeps; asserts a single instance is created and both callers see it.
   - Python: `test_get_resource_single_instance_under_concurrent_access` — same 
contract via two threads and a `threading.Barrier`; asserts the provider runs 
once.
   
   Both tests fail without the corresponding guard and pass with it.
   
   ### API
   
   No public API changes.
   
   ### Documentation
   
   - [ ] `doc-needed`
   - [x] `doc-not-needed`
   - [ ] `doc-included`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to