[PR] [api][plan][integrations] Record built-in chat token metrics outside the async call boundary [flink-agents]

via GitHub Thu, 28 May 2026 23:12:45 -0700


weiqingy opened a new pull request, #712:
URL: https://github.com/apache/flink-agents/pull/712


   <!--
   * Thank you very much for contributing to Flink Agents.
   * Please add the relevant components in the PR title. E.g., [api], 
[runtime], [java], [python], [hotfix], etc.
   -->
   
   Linked issue: #706
   
   ### Purpose of change
   
   The Java built-in chat action recorded token metrics from *inside* the 
durable async callable: each chat-model connection called 
`BaseChatModelConnection.recordTokenMetrics(...)` within its `chat()` method, 
which runs on a durable-execution pool thread. The runtime metric group is 
meant to be used from the operator/mailbox (action) thread, so touching it from 
the callable crosses that boundary.
   
   This brings Java to the same execution boundary as Python. Mirroring 
`chat_model_action.py`:
   
   - Each of the 7 Java connections (Ollama, Anthropic, Bedrock, AzureAI, 
OpenAI Completions/Responses/AzureOpenAI) now stashes `model_name` / 
`promptTokens` / `completionTokens` into the response `ChatMessage.extraArgs` 
instead of recording inside `chat()`.
   - `ChatModelAction` records **after** the durable call returns (before 
structured-output reassignment, which would drop the keys) via a new 
`BaseChatModelSetup.recordTokenMetrics(...)` — the Python-parity record site 
(`chat_model._record_token_metrics`). The setup's bound metric group is the 
action metric group, so the **emitted metric path and counter names are 
unchanged**.
   - The old `BaseChatModelConnection.recordTokenMetrics(...)` and its now-dead 
`connection.setMetricGroup(...)` forwarding are removed.
   - The `RunnerContext` metric-group getter Javadoc now documents that the 
returned group must only be accessed from the operator/mailbox thread, not 
inside a durable callable.
   
   Recording matches Python's guard exactly: the model name must be non-empty 
and both token counts greater than zero; values are read as `Number` and 
converted with `longValue()` to tolerate `Integer`/`Long` across the Pemja 
bridge and durable recovery.
   
   This also fixes a latent gap: Python-backed chat models invoked from the 
Java action previously recorded **no** token metrics (the path bypasses the 
Java connection's recording); they are now captured once.
   
   ### Tests
   
   - Relocated the base token-metrics test to the setup 
(`BaseChatModelSetupTokenMetricsTest`), mirroring Python's 
`test_token_metrics.py`: records to the per-model sub-group counters, no-ops 
without a metric group, separate sub-groups per model, `getResourceType() == 
CHAT_MODEL`.
   - Extended `ChatModelActionTest` with cases for the new 
`recordChatTokenMetrics` helper: records once when all keys are present and 
positive; `Integer`-typed token values still recorded via `Number.longValue()`; 
skips when a key is missing or non-numeric; skips when a token is `0` or the 
model name is empty (Python parity).
   - `./tools/build.sh -j`, `./tools/ut.sh -j`, and `./tools/lint.sh -c` all 
pass.
   
   ### API
   
   Adds `public BaseChatModelSetup.recordTokenMetrics(String, long, long)` (the 
Python-parity record site) and removes the previously `protected` 
`BaseChatModelConnection.recordTokenMetrics(...)`. No change to public 
configuration, event, or resource APIs. Emitted metric names and paths are 
unchanged.
   
   ### Documentation
   
   - [ ] `doc-needed`
   - [x] `doc-not-needed`
   - [ ] `doc-included`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [api][plan][integrations] Record built-in chat token metrics outside the async call boundary [flink-agents]

Reply via email to