weiqingy commented on PR #548:
URL: https://github.com/apache/flink-agents/pull/548#issuecomment-3982608733

   I checked the CI failures - both are LLM-dependent e2e tests and don’t 
appear to be caused by this PR.                                         
      
   Test 1 (react_agent_test): The output 4444 = 2123 + 2321 proves our 
ResourceCache IS working correctly — the chat model was resolved, the add tool 
was resolved and called successfully. The LLM (qwen3:1.7b) simply stopped after 
one tool call instead of continuing to call multiply(4444, 312). This is LLM 
non-determinism.
   
   Test 2 (long_term_memory_test): This runs on the Flink remote runner, where 
there's exactly ONE FlinkRunnerContext with ONE ResourceCache. The behavior is 
identical to before. The failure is assert len(doc) == 1 after LLM-based 
compaction using qwen3:8b — if the model's summarization response is malformed, 
compaction produces incorrect output.
   
   We can re-run CI to confirm flakiness — if it fails again with different 
assertion values, that would further support LLM non-determinism. 
   
   @wenjin272 do you have access to re-run the CI tests? It looks like admin 
rights are required.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to