Re: [PR] [plan] Support error handling strategy for built-in chat action. [flink-agents]

via GitHub Fri, 02 Jan 2026 23:32:25 -0800


xintongsong commented on code in PR #406:
URL: https://github.com/apache/flink-agents/pull/406#discussion_r2658774810



##########
python/flink_agents/plan/actions/chat_model_action.py:
##########
@@ -51,54 +185,45 @@ def chat(
         "BaseChatModelSetup", ctx.get_resource(model, ResourceType.CHAT_MODEL)
     )
 
+    error_handling_strategy = 
ctx.config.get(AgentConfigOptions.ERROR_HANDLING_STRATEGY)
+    num_retries = 0
+    if error_handling_strategy == ErrorHandlingStrategy.RETRY:
+        num_retries = max(0, ctx.config.get(AgentConfigOptions.MAX_RETRIES))
+
     # TODO: support async execution of chat.
-    response = chat_model.chat(messages)
-    short_term_memory = ctx.short_term_memory
-
-    # generate tool request event according tool calls in response
-    if len(response.tool_calls) > 0:
-        # TODO: Because memory doesn't support remove currently, so we use
-        #  dict to store tool context in memory and remove the specific
-        #  tool context from dict after consuming. This will cause write and
-        #  read amplification for we need get the whole dict and overwrite it
-        #  to memory each time we update a specific tool context.
-        #  After memory supports remove, we can use 
"TOOL_CALL_CONTEXT/request_id"
-        #  to store and remove the specific tool context directly.
-
-        # save tool call context
-        tool_call_context = short_term_memory.get(_TOOL_CALL_CONTEXT)
-        if not tool_call_context:
-            tool_call_context = {}
-        if initial_request_id not in tool_call_context:
-            tool_call_context[initial_request_id] = copy.deepcopy(messages)
-        # append response to tool call context
-        tool_call_context[initial_request_id].append(response)
-        # update tool call context
-        short_term_memory.set(_TOOL_CALL_CONTEXT, tool_call_context)
-
-        tool_request_event = ToolRequestEvent(
-            model=model,
-            tool_calls=response.tool_calls,
+    response = None
+    for attempt in range(num_retries + 1):
+        try:
+            response = chat_model.chat(messages)
+            if output_schema is not None and len(response.tool_calls) == 0:
+                response = _generate_structured_output(response, output_schema)
+        except Exception as e:  # noqa: PERF203
+            if error_handling_strategy == ErrorHandlingStrategy.IGNORE:
+                _logger.warning(
+                    f"Chat request {initial_request_id} failed with error: 
{e}, ignored."
+                )
+                return
+            elif error_handling_strategy == ErrorHandlingStrategy.RETRY:
+                if attempt == num_retries:
+                    raise
+                _logger.warning(
+                    f"Chat request {initial_request_id} failed with error: 
{e}, retrying {attempt} / {num_retries}."
+                )
+            else:
+                _logger.debug(
+                    f"Chat request {initial_request_id} failed, the input chat 
messages are {messages}."
+                )
+                raise
+
+    if (
+        len(response.tool_calls) > 0
+    ):  # generate tool request event according tool calls in response
+        _handle_tool_calls(
+            response, initial_request_id, model, messages, output_schema, ctx
         )
+    else:  # if there is no tool call generated, return chat response directly
+        _clear_tool_call_context(ctx.sensory_memory, initial_request_id)

Review Comment:
   Do we still need this when using sensory memory?



##########
api/src/test/java/org/apache/flink/agents/api/resource/ResourceDescriptorTest.java:
##########
@@ -18,8 +18,8 @@
 
 package org.apache.flink.agents.api.resource;
 
-import org.apache.flink.agents.api.agents.Agent;
 import org.apache.flink.agents.api.InputEvent;
+import org.apache.flink.agents.api.agents.Agent;

Review Comment:
   There are quite some changes that belong to the wrong commit



##########
python/flink_agents/api/core_options.py:
##########
@@ -77,3 +89,15 @@ class AgentConfigOptions(metaclass=AgentConfigOptionsMeta):
         config_type=str,
         default=None,
     )
+
+    ERROR_HANDLING_STRATEGY = ConfigOption(
+        key="error-handling-strategy",
+        config_type=ErrorHandlingStrategy,
+        default=ErrorHandlingStrategy.FAIL,
+    )
+
+    MAX_RETRIES = ConfigOption(
+        key="max-retries",
+        config_type=int,
+        default=0,

Review Comment:
   Shall we by default enable this?



##########
python/flink_agents/plan/actions/chat_model_action.py:
##########
@@ -107,55 +232,65 @@ def chat(
         )
 
 
+def _process_chat_request(event: ChatRequestEvent, ctx: RunnerContext) -> None:
+    """Process chat request event."""
+    chat(
+        initial_request_id=event.id,
+        model=event.model,
+        messages=event.messages,
+        output_schema=event.output_schema,
+        ctx=ctx,
+    )
+
+
+def _process_tool_response(event: ToolResponseEvent, ctx: RunnerContext) -> 
None:
+    """Organize the tool call context and return it to the LLM."""
+    sensory_memory = ctx.sensory_memory
+    request_id = event.request_id
+
+    # get correspond tool request event context
+    tool_request_event_context = _remove_tool_request_event_context(
+        sensory_memory, request_id
+    )

Review Comment:
   Same here. Since we're using sensory memory, do we still need to remove the 
context? This brings an extra memory update.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [plan] Support error handling strategy for built-in chat action. [flink-agents]

Reply via email to