Re: [PR] [hotfix] Make cross-language e2e test order-insensitive and add request time [flink-agents]

via GitHub Sat, 30 May 2026 00:52:13 -0700


rosemarYuan commented on code in PR #713:
URL: https://github.com/apache/flink-agents/pull/713#discussion_r3328416464



##########
python/flink_agents/e2e_tests/e2e_tests_resource_cross_language/chat_model_cross_language_test.py:
##########
@@ -106,5 +106,6 @@ def test_java_chat_model_integration(
             with file.open() as f:
                 actual_result.extend(f.readlines())
 
-    assert "3" in actual_result[0]
-    assert "cat" in actual_result[1]
+    joined = "\n".join(actual_result).lower()
+    assert "3" in joined, f"math answer missing '3': {actual_result!r}"

Review Comment:
   Thanks for flagging this — the concern is valid. A stricter math assertion 
would re-introduce flakiness, and those failures would be model-capability 
noise rather than actual cross-language regressions. So I think accepting the 
weaker math signal is a reasonable trade-off for this hotfix.
   
   The way I see it, this is two different problems:
   **(1) E2E cross-language behavioral consistency** — the primary goal of this 
test. The order-insensitive join + lowercased check addresses this, and the 
current approach prioritizes it.
   **(2) Model output quality validation** — a harder problem that a 1.7b model 
on unstable CI hardware is fundamentally ill-suited for. If we want to 
strengthen this later, some possible directions might be:
   - Upgrading the CI model to one with more reliable arithmetic capability;
   - Structuring and formalizing the prompt (e.g., explicit chain-of-thought 
with strict output formatting);
   - Adding a post-inference verification step to verify whether the model 
output meets the Prompt expectation before the assertion is run.
   These improvements to (2) are out of scope for this hotfix. Would love to 
hear your thoughts on whether this trade-off works for now, or if you'd prefer 
a different approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [hotfix] Make cross-language e2e test order-insensitive and add request time [flink-agents]

Reply via email to