weiqingy opened a new pull request, #843:
URL: https://github.com/apache/flink-agents/pull/843

   Linked issue: #280
   
   ### Purpose of change
   
   Today `output_schema` is honored only by prompt-engineering the request and 
parsing the response text; no chat-model integration uses a provider's native 
structured-output API, which is the most reliable strategy.
   
   This PR adds the foundation for native structured output at the chat-model 
connection layer, plus the OpenAI implementation, in both Java and Python. It 
is the first in a small stack under #280 (Azure/Ollama, Anthropic, and 
DashScope follow in separate PRs; wiring native output into the ReActAgent 
final-output flow is a separate follow-up).
   
   How it works:
   
   - The request's output schema is carried to the connection through a 
reserved key (`__structured_output_schema__`) in the existing 
`modelParams`/`kwargs` map, so the abstract `chat()` signature is unchanged.
   - Each connection declares a boolean native-structured-output capability 
(`supportsNativeStructuredOutput()` / `supports_native_structured_output`), 
default `false`.
   - A connection applies the native API only when: a schema is present, no 
tools are bound on the call, the schema is a POJO (Java) / `BaseModel` (Python) 
rather than a `RowTypeInfo`, and the setup is same-language.
   - The reserved key is always removed before the SDK call so it cannot leak 
into a provider request.
   - The prompt-engineered path is retained as the fallback and is unaffected. 
In the ReAct loop tools are always bound, so the native path stays dormant 
there and existing behavior is unchanged.
   
   OpenAI applies `response_format` json_schema with strict validation. The 
other connections only strip the reserved key for now; their native paths 
arrive in later PRs.
   
   The same-language guard matters because native structured output cannot work 
across the language boundary (a Java `Class` is not a Python `BaseModel`), so 
the schema object is never marshaled across the Pemja bridge.
   
   ### Tests
   
   Unit tests with the provider SDK mocked (no network):
   
   - OpenAI native applied when a schema is present and no tools are bound 
(Java + Python); the SDK request carries `response_format` json_schema strict.
   - Native NOT applied when tools are bound (the no-regression gate), and NOT 
applied for a `RowTypeInfo` schema.
   - The reserved key never leaks to a provider SDK (Python connections forward 
`**kwargs`, so each strips it; a direct unit test of the base pop helper covers 
removal).
   - Same-language threading guard: a cross-language setup with an 
`output_schema` does not receive the reserved key; a same-language setup does.
   - Existing ReActAgent prompt-path tests remain green unchanged.
   
   ### API
   
   Yes — additive only. `BaseChatModelConnection` gains a public reserved-key 
constant and a `protected` capability method (default `false`); no existing 
signatures change.
   
   ### Documentation
   
   - [ ] `doc-needed`
   - [x] `doc-not-needed`
   - [ ] `doc-included`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to