[PR] [release-0.2][backport][api][plan][integrations] Record built-in chat token metrics outside the async call boundary (#712) [flink-agents]

via GitHub Mon, 01 Jun 2026 04:28:10 -0700


xintongsong opened a new pull request, #725:
URL: https://github.com/apache/flink-agents/pull/725


   Backport of #712 to `release-0.2`. The change keeps token-metric recording 
on the operator/mailbox (action) thread instead of the durable-execution pool 
thread, mirroring the Python side of the framework.
   
   ### Scope on release-0.2
   
   The original PR touched 7 connections; `release-0.2` only ships 4 of them. 
The Bedrock, AzureOpenAI and OpenAIResponses connections are not present on 
this branch and are intentionally excluded from the backport. Covered 
connections:
   
   - Ollama
   - Anthropic
   - AzureAI
   - OpenAI (single connection on `release-0.2`; on `main` it has since been 
split into Completions / Responses / AzureOpenAI)
   
   ### What changed
   
   - `BaseChatModelSetup` gains `public recordTokenMetrics(String, long, long)` 
— the Python-parity record site. The setup's bound metric group is the action 
metric group, so **the emitted metric path and counter names are unchanged**.
   - `BaseChatModelConnection.recordTokenMetrics(...)` and the now-dead 
`connection.setMetricGroup(...)` forwarding in `BaseChatModelSetup` are removed.
   - Each of the 4 connections' `chat()` now stashes `model_name` / 
`promptTokens` / `completionTokens` into the response `ChatMessage.extraArgs` 
instead of recording inside `chat()`.
   - `ChatModelAction` records **after** the durable call returns (before 
structured-output reassignment, which would drop the keys) via a new `static 
recordChatTokenMetrics(...)` helper.
   - `RunnerContext` metric-group getter javadoc now documents that the 
returned group must only be accessed from the operator/mailbox thread, not 
inside a durable callable.
   
   Recording is gated identically to Python: non-empty model name and both 
token counts greater than zero; values are read as `Number` and converted with 
`longValue()` to tolerate `Integer`/`Long` across durable recovery.
   
   This also fixes the same latent gap as on `main`: Python-backed chat models 
invoked from the Java action previously recorded **no** token metrics (the path 
bypasses the Java connection's recording); they are now captured once via the 
setup.
   
   ### Tests
   
   - Relocated the base token-metrics test to the setup 
(`BaseChatModelSetupTokenMetricsTest`), mirroring the rename on `main`.
   - New `ChatModelActionTest` covers `recordChatTokenMetrics`: records when 
all keys are present and positive; `Integer`-typed token values still recorded 
via `Number#longValue()`; skips when a key is missing or non-numeric; skips 
when a token is `0` or the model name is empty (Python parity).
   - `./tools/build.sh -j` and module-level `mvn test` on api / plan / 4 
covered connections all pass locally.
   
   ### API
   
   Adds `public BaseChatModelSetup.recordTokenMetrics(String, long, long)` and 
removes the previously `protected` 
`BaseChatModelConnection.recordTokenMetrics(...)`. No change to public 
configuration, event, or resource APIs. Emitted metric names and paths are 
unchanged.
   
   ### Documentation
   
   - [ ] `doc-needed`
   - [x] `doc-not-needed`
   - [ ] `doc-included`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [release-0.2][backport][api][plan][integrations] Record built-in chat token metrics outside the async call boundary (#712) [flink-agents]

Reply via email to