xintongsong opened a new pull request, #725: URL: https://github.com/apache/flink-agents/pull/725
Backport of #712 to `release-0.2`. The change keeps token-metric recording on the operator/mailbox (action) thread instead of the durable-execution pool thread, mirroring the Python side of the framework. ### Scope on release-0.2 The original PR touched 7 connections; `release-0.2` only ships 4 of them. The Bedrock, AzureOpenAI and OpenAIResponses connections are not present on this branch and are intentionally excluded from the backport. Covered connections: - Ollama - Anthropic - AzureAI - OpenAI (single connection on `release-0.2`; on `main` it has since been split into Completions / Responses / AzureOpenAI) ### What changed - `BaseChatModelSetup` gains `public recordTokenMetrics(String, long, long)` — the Python-parity record site. The setup's bound metric group is the action metric group, so **the emitted metric path and counter names are unchanged**. - `BaseChatModelConnection.recordTokenMetrics(...)` and the now-dead `connection.setMetricGroup(...)` forwarding in `BaseChatModelSetup` are removed. - Each of the 4 connections' `chat()` now stashes `model_name` / `promptTokens` / `completionTokens` into the response `ChatMessage.extraArgs` instead of recording inside `chat()`. - `ChatModelAction` records **after** the durable call returns (before structured-output reassignment, which would drop the keys) via a new `static recordChatTokenMetrics(...)` helper. - `RunnerContext` metric-group getter javadoc now documents that the returned group must only be accessed from the operator/mailbox thread, not inside a durable callable. Recording is gated identically to Python: non-empty model name and both token counts greater than zero; values are read as `Number` and converted with `longValue()` to tolerate `Integer`/`Long` across durable recovery. This also fixes the same latent gap as on `main`: Python-backed chat models invoked from the Java action previously recorded **no** token metrics (the path bypasses the Java connection's recording); they are now captured once via the setup. ### Tests - Relocated the base token-metrics test to the setup (`BaseChatModelSetupTokenMetricsTest`), mirroring the rename on `main`. - New `ChatModelActionTest` covers `recordChatTokenMetrics`: records when all keys are present and positive; `Integer`-typed token values still recorded via `Number#longValue()`; skips when a key is missing or non-numeric; skips when a token is `0` or the model name is empty (Python parity). - `./tools/build.sh -j` and module-level `mvn test` on api / plan / 4 covered connections all pass locally. ### API Adds `public BaseChatModelSetup.recordTokenMetrics(String, long, long)` and removes the previously `protected` `BaseChatModelConnection.recordTokenMetrics(...)`. No change to public configuration, event, or resource APIs. Emitted metric names and paths are unchanged. ### Documentation - [ ] `doc-needed` - [x] `doc-not-needed` - [ ] `doc-included` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
