joeyutong opened a new issue, #858: URL: https://github.com/apache/flink-agents/issues/858
### Search before asking - [x] I searched in the [issues](https://github.com/apache/flink-agents/issues) and found nothing similar. ### Description Embedding model calls currently do not consistently report token usage metrics. Chat models already have a token accounting path: provider usage is attached to the chat response and later recorded as model-level token metrics. Embedding models return only vectors, so provider usage returned by OpenAI-compatible or DashScope-style embedding APIs can be dropped before it reaches the metrics layer. Affected paths include: - Direct Java or Python embedding model calls. - Vector store and RAG paths that auto-generate embeddings during `add`, `update`, or `query`. - Cross-language resource paths where the wrapper may receive an action metric group but the provider-side embedding resource performs the actual request. This makes it harder to validate and compare embedding model cost/usage, especially when a job mixes chat, embedding, and vector store operations. Embedding metrics do not need `completionTokens`, but should expose input-side token usage, for example `promptTokens` and `totalTokens`, under the same model/provider metric dimensions used by chat metrics where possible. ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
