Re: [PR] feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming) [apisix]

via GitHub Wed, 24 Jun 2026 22:45:43 -0700


nic-6443 commented on code in PR #13606:
URL: https://github.com/apache/apisix/pull/13606#discussion_r3472147278



##########
apisix/plugins/ai-lakera-guard.lua:
##########
@@ -206,4 +215,60 @@ function _M.access(conf, ctx)
 end
 
 
+function _M.lua_body_filter(conf, ctx, headers, body)
+    if conf.direction ~= "output" and conf.direction ~= "both" then
+        return
+    end
+
+    if ngx.status >= 400 then
+        return
+    end
+
+    -- Non-streaming: ai-proxy hands us the fully-assembled completion text.
+    if ctx.var.request_type == "ai_chat" then
+        local text = ctx.var.llm_response_text
+        if not text or text == "" then
+            return
+        end
+        local messages = { { role = "assistant", content = text } }
+        return moderate(ctx, conf, messages, "response", 
conf.response_failure_message)
+    end
+
+    -- Streaming: lua_body_filter is invoked once per upstream chunk. We cannot
+    -- scan a partial completion and we must not let flagged tokens reach the
+    -- client, so we buffer every chunk (withholding it with an empty body) and
+    -- scan the assembled completion once at end-of-stream. This trades
+    -- incremental delivery for true blocking.
+    if ctx.var.request_type == "ai_stream" then
+        local buffer = ctx.lakera_response_buffer
+        if not buffer then
+            buffer = {}
+            ctx.lakera_response_buffer = buffer
+        end
+        buffer[#buffer + 1] = body or ""
+
+        if not ctx.var.llm_request_done then
+            return nil, ":\n\n"
+        end
+
+        -- End of stream: ai-proxy has assembled the full completion text.
+        local text = ctx.var.llm_response_text
+        if not text or text == "" then

Review Comment:
   This path silently fails *open*, which is surprising for a guard that 
defaults to fail-closed everywhere else. The subtlety is that 
`ctx.var.llm_response_text` is published only on a `usage` SSE event 
(`ai-providers/base.lua` sets it inside `if parsed.usage then`), not on a plain 
`[DONE]`. ai-proxy injects `stream_options.include_usage=true`, but lots of 
OpenAI-compatible / self-hosted providers ignore it and stream content followed 
by `[DONE]` with no usage chunk. In that case `buffer` still holds the real 
(possibly flagged) completion, but `text` is nil, so `return nil, 
concat(buffer)` releases it to the client unscanned — and with no log line 
saying scanning was skipped. The non-streaming branch (line 230) has the same 
shape; lower impact there since nothing is withheld.
   
   Every test passes because the fixtures always carry a usage chunk, so this 
gap is invisible in CI. Could this honor `fail_open` (block by default) when 
there's buffered content but no assembled text, instead of releasing — or at 
least emit a warn so the skip is observable?



##########
apisix/plugins/ai-lakera-guard.lua:
##########
@@ -206,4 +215,60 @@ function _M.access(conf, ctx)
 end
 
 
+function _M.lua_body_filter(conf, ctx, headers, body)
+    if conf.direction ~= "output" and conf.direction ~= "both" then
+        return
+    end
+
+    if ngx.status >= 400 then
+        return
+    end
+
+    -- Non-streaming: ai-proxy hands us the fully-assembled completion text.
+    if ctx.var.request_type == "ai_chat" then
+        local text = ctx.var.llm_response_text
+        if not text or text == "" then
+            return
+        end
+        local messages = { { role = "assistant", content = text } }
+        return moderate(ctx, conf, messages, "response", 
conf.response_failure_message)
+    end
+
+    -- Streaming: lua_body_filter is invoked once per upstream chunk. We cannot
+    -- scan a partial completion and we must not let flagged tokens reach the
+    -- client, so we buffer every chunk (withholding it with an empty body) and
+    -- scan the assembled completion once at end-of-stream. This trades
+    -- incremental delivery for true blocking.
+    if ctx.var.request_type == "ai_stream" then

Review Comment:
   In `alert` (shadow) mode this still buffers the whole stream and withholds 
every chunk until end-of-stream, then releases it all at once. The docs frame 
`alert` as a non-intrusive, log-only pass-through, but on streaming routes the 
client loses token-by-token delivery and instead receives the full body in one 
shot once scanning finishes. So someone who sets `direction: output, action: 
alert` precisely to observe Lakera verdicts *without* affecting traffic does 
change the observable latency/streaming behavior.
   
   Would it make sense to skip the buffering when `action == "alert"` (let 
chunks flow through live and scan a copy at the end), or at least document the 
streaming caveat for shadow mode?



##########
apisix/plugins/ai-lakera-guard.lua:
##########
@@ -206,4 +215,60 @@ function _M.access(conf, ctx)
 end
 
 
+function _M.lua_body_filter(conf, ctx, headers, body)
+    if conf.direction ~= "output" and conf.direction ~= "both" then
+        return
+    end
+
+    if ngx.status >= 400 then
+        return
+    end
+
+    -- Non-streaming: ai-proxy hands us the fully-assembled completion text.
+    if ctx.var.request_type == "ai_chat" then
+        local text = ctx.var.llm_response_text
+        if not text or text == "" then
+            return
+        end
+        local messages = { { role = "assistant", content = text } }
+        return moderate(ctx, conf, messages, "response", 
conf.response_failure_message)
+    end
+
+    -- Streaming: lua_body_filter is invoked once per upstream chunk. We cannot
+    -- scan a partial completion and we must not let flagged tokens reach the
+    -- client, so we buffer every chunk (withholding it with an empty body) and
+    -- scan the assembled completion once at end-of-stream. This trades
+    -- incremental delivery for true blocking.
+    if ctx.var.request_type == "ai_stream" then
+        local buffer = ctx.lakera_response_buffer
+        if not buffer then
+            buffer = {}
+            ctx.lakera_response_buffer = buffer
+        end
+        buffer[#buffer + 1] = body or ""
+
+        if not ctx.var.llm_request_done then

Review Comment:
   The buffer is only ever released when `llm_request_done` is observed. The 
note in the docs covers a dropped connection, but ai-proxy's runaway safeguards 
(`max_stream_duration_ms` / `max_response_bytes`) also set `llm_request_done = 
true` and then `return` without dispatching another body_filter pass — so the 
buffered content is stranded and never released. That hits *clean* responses 
too, not just flagged ones: the client ends up with only the `:\n\n` heartbeats 
and no `[DONE]`, even though the response was fine and merely tripped a 
size/duration cap.
   
   Might be worth flushing (and scanning) whatever is buffered on that abort 
path, or at least widening the doc caveat beyond "dropped connection" to 
include the gateway-side safeguards.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming) [apisix]

Reply via email to