weiqingy opened a new pull request, #712: URL: https://github.com/apache/flink-agents/pull/712
<!-- * Thank you very much for contributing to Flink Agents. * Please add the relevant components in the PR title. E.g., [api], [runtime], [java], [python], [hotfix], etc. --> Linked issue: #706 ### Purpose of change The Java built-in chat action recorded token metrics from *inside* the durable async callable: each chat-model connection called `BaseChatModelConnection.recordTokenMetrics(...)` within its `chat()` method, which runs on a durable-execution pool thread. The runtime metric group is meant to be used from the operator/mailbox (action) thread, so touching it from the callable crosses that boundary. This brings Java to the same execution boundary as Python. Mirroring `chat_model_action.py`: - Each of the 7 Java connections (Ollama, Anthropic, Bedrock, AzureAI, OpenAI Completions/Responses/AzureOpenAI) now stashes `model_name` / `promptTokens` / `completionTokens` into the response `ChatMessage.extraArgs` instead of recording inside `chat()`. - `ChatModelAction` records **after** the durable call returns (before structured-output reassignment, which would drop the keys) via a new `BaseChatModelSetup.recordTokenMetrics(...)` — the Python-parity record site (`chat_model._record_token_metrics`). The setup's bound metric group is the action metric group, so the **emitted metric path and counter names are unchanged**. - The old `BaseChatModelConnection.recordTokenMetrics(...)` and its now-dead `connection.setMetricGroup(...)` forwarding are removed. - The `RunnerContext` metric-group getter Javadoc now documents that the returned group must only be accessed from the operator/mailbox thread, not inside a durable callable. Recording matches Python's guard exactly: the model name must be non-empty and both token counts greater than zero; values are read as `Number` and converted with `longValue()` to tolerate `Integer`/`Long` across the Pemja bridge and durable recovery. This also fixes a latent gap: Python-backed chat models invoked from the Java action previously recorded **no** token metrics (the path bypasses the Java connection's recording); they are now captured once. ### Tests - Relocated the base token-metrics test to the setup (`BaseChatModelSetupTokenMetricsTest`), mirroring Python's `test_token_metrics.py`: records to the per-model sub-group counters, no-ops without a metric group, separate sub-groups per model, `getResourceType() == CHAT_MODEL`. - Extended `ChatModelActionTest` with cases for the new `recordChatTokenMetrics` helper: records once when all keys are present and positive; `Integer`-typed token values still recorded via `Number.longValue()`; skips when a key is missing or non-numeric; skips when a token is `0` or the model name is empty (Python parity). - `./tools/build.sh -j`, `./tools/ut.sh -j`, and `./tools/lint.sh -c` all pass. ### API Adds `public BaseChatModelSetup.recordTokenMetrics(String, long, long)` (the Python-parity record site) and removes the previously `protected` `BaseChatModelConnection.recordTokenMetrics(...)`. No change to public configuration, event, or resource APIs. Emitted metric names and paths are unchanged. ### Documentation - [ ] `doc-needed` - [x] `doc-not-needed` - [ ] `doc-included` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
