weiqingy commented on code in PR #713:
URL: https://github.com/apache/flink-agents/pull/713#discussion_r3328358218


##########
python/flink_agents/e2e_tests/e2e_tests_resource_cross_language/chat_model_cross_language_test.py:
##########
@@ -106,5 +106,6 @@ def test_java_chat_model_integration(
             with file.open() as f:
                 actual_result.extend(f.readlines())
 
-    assert "3" in actual_result[0]
-    assert "cat" in actual_result[1]
+    joined = "\n".join(actual_result).lower()
+    assert "3" in joined, f"math answer missing '3': {actual_result!r}"

Review Comment:
   Now that the check searches the whole joined, lowercased output, `"3"` is a 
single digit that's very likely to appear somewhere regardless of the math 
result — in the cat answer, an incidental number, or the qwen3 reasoning trace 
— so this assertion may pass even if the math path regressed. The yaml test's 
`"22"` is specific enough to avoid that, but `"3"` (the answer to "1+2") 
doesn't have an obvious more-specific token, and the join is exactly what makes 
it order-insensitive. Is accepting the weaker math signal the intended 
trade-off here for a flaky-test hotfix, or worth a stricter check?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to