weiqingy opened a new pull request, #509:
URL: https://github.com/apache/flink-agents/pull/509

   <!--
   * Thank you very much for contributing to Flink Agents.
   * Please add the relevant components in the PR title. E.g., [api], 
[runtime], [java], [python], [hotfix], etc.
   -->
   
   <!-- Please link the PR to the relevant issue(s). Hotfix doesn't need this. 
-->
   Linked issue: #508 
   
   ### Purpose of change
   
   <!-- What is the purpose of this change? -->
     Fix a crash that occurs when restoring a Python async action from 
checkpoint.                        
                                                                                
                          
     **Root cause:** Python coroutines cannot be serialized. When a checkpoint 
captures state while an    
     async action is in progress (e.g., waiting for LLM response), the 
coroutine object is lost. On       
     restore, the awaitable reference is `None`, causing:                       
                          
     AttributeError: 'NoneType' object has no attribute 'send'                  
                          
                                                                                
                          
     **Fix:**                                                                   
                          
     - Detect `None` awaitable in `PythonActionExecutor.callPythonAwaitable()`  
                          
     - Throw `AwaitableLostException` to signal the awaitable was lost          
                          
     - `PythonGeneratorActionTask` catches this and re-executes the action from 
the beginning             
     - Durable execution cache ensures already-completed calls are skipped   
   
   ### Tests
   
   <!-- How is this change verified? -->
     - Manual testing: Run `react_agent_example.py`, trigger LLM timeout, 
verify job recovers instead of  
     crashing                                                                   
                          
     - No unit test added - these classes depend on Pemja (Python interpreter) 
which is difficult to mock;
      the fix is better validated via e2e testing    
   ### API
   
   <!-- Does this change touches any public APIs? -->
   No public API changes. 
   ### Documentation
   
   <!-- Do not remove this section. Check the proper box only. -->
   
   - [ ] `doc-needed` <!-- Your PR changes impact docs -->
   - [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-included` <!-- Your PR already contains the necessary 
documentation updates -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to