wenjin272 commented on PR #667:
URL: https://github.com/apache/flink-agents/pull/667#issuecomment-4457592067

   > The one CI failure (`it-python [java-17] [python-3.12] [flink-2.1]`) is 
the known `test_react_agent_on_local_runner` LLM flake against Ollama 
`qwen3:1.7b`, not caused by this PR:
   > 
   > ```
   > FAILED 
flink_agents/e2e_tests/e2e_tests_integration/react_agent_test.py::test_react_agent_on_local_runner
   >   - assert 432596736 == 1386528
   > ```
   > 
   > The test expects `4444 × 312 = 1386528`, but the LLM made an extra 
unnecessary `multiply(1386528, 312)` call and returned `432596736`. The test 
source has a comment right next to the assertion: _"This may be caused by the 
LLM response does not match the output schema, you can rerun this case."_
   > 
   > This same failure (same exact numbers, `432596736 == 1386528`) is 
currently failing on `main` at `b38ae21` — the commit this PR is rebased onto — 
and on several other recent main-branch runs. Failure runs through the Python 
`local_runner`, which logs `"Local runner does not support durable execution; 
recovery is not available."` — the Java `DurableExecutionManager` / 
`ActionExecutionOperator` paths changed by this PR are never exercised.
   > 
   > Will re-run CI.
   
   I believe we need to polish the stability and observability of CI in version 
0.4. If you encounter any unstable cases, please contact me to rerun them. I 
now have the permission to rerun failed CI jobs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to