janiussyafiq opened a new pull request, #13355:
URL: https://github.com/apache/apisix/pull/13355

   ### Description
   
   This PR introduces `ai-lakera-guard`, a new AI security plugin that scans 
LLM prompts and responses for prompt injection, jailbreaks, PII, profanity, and 
unknown-link exfiltration via Lakera Guard's `/v2/guard` API. It fills the gap 
between the existing regex-based `ai-prompt-guard` and the general-purpose 
toxicity scorers (`ai-aws-content-moderation`, `ai-aliyun-content-moderation`) 
by adding a prompt-security-native commercial vendor integration. Kong already 
ships an analogous `ai-lakera-guard`; LiteLLM supports it as a guardrail 
backend.
   
   This is the **walking skeleton** of a multi-PR series tracked under #13291. 
It lands the foundation and the request-side block path on the `openai-chat` 
protocol; output-direction scanning, the other three protocols, streaming, 
alert mode, `fail_open`, `$secret://` integration, `project_id` forwarding, 
`reveal_failure_categories`, and user documentation will follow in subsequent 
PRs.
   
   **Plugin priority:** 1028 — slots into the existing AI plugin chain just 
below `ai-aliyun-content-moderation` (1029).
   
   **Modules added:**
   
   | File | Responsibility |
   |---|---|
   | `apisix/plugins/ai-lakera-guard.lua` | Phase entry points (`_M.access`, 
plus `_M.lua_body_filter` and `_M.log` no-op stubs for later slices), plugin 
metadata. Uses `apisix.plugins.ai-protocols.<proto>.build_deny_response` so SDK 
clients see well-formed completion-shape deny bodies instead of free-form HTTP 
error strings. |
   | `apisix/plugins/ai-lakera-guard/schema.lua` | Full JSON Schema for the 
plugin (every field that later slices will exercise lands here, so subsequent 
PRs add behavior rather than new fields). `encrypt_fields = 
["endpoint.api_key"]` enables APISIX's at-rest encryption and `$secret://` 
resolution. |
   | `apisix/plugins/ai-lakera-guard/client.lua` | HTTP client for `/v2/guard` 
with connection-pool reuse via `resty.http` + `set_keepalive`, so each scan 
doesn't burn a fresh TLS handshake to `api.lakera.ai`. Narrow exported surface: 
`scan(conf, messages, project_id) → (flagged, detector_types, err)`. |
   
   **Key design choices, with rationale documented in the proposal/issue:**
   
   - **Default `on_block.status = 200`** with a completion-shape deny body — 
matching `ai-aliyun-content-moderation`'s `deny_code = 200`. Provider SDKs 
raise on 4xx before parsing the body, so a completion-shape deny under 4xx 
would be unreachable to the calling application. Operators wanting 4xx can 
override.
   - **Default `response_buffer_size = 128`** bytes — matches 
`ai-aliyun-content-moderation`'s `stream_check_cache_size` and is in the same 
neighborhood as Kong's `ai-lakera-guard` default of 100 bytes. (Used in the 
streaming slice — schema lands here so the value is consistent across slices.)
   - **`fail_open` defaults to `false`** (safety-first) — matches the project 
tracking issue's design. (Behavior wired in a later slice; schema lands here.)
   - **Full schema upfront** — every field that subsequent slices need is in 
the schema as of this PR. This means slices S2 onward modify only behavior, not 
the schema shape, and reduces churn for operators evaluating the plugin while 
it's incrementally built out.
   
   **Tests** (`t/plugin/ai-lakera-guard.t`):
   
   | TEST | Behavior |
   |---|---|
   | 1 | Schema accepts a minimal valid config (only `endpoint.api_key`) via 
the Admin API. |
   | 2 | Schema rejects a missing `endpoint.api_key` with 400 and the standard 
plugin-validation error message. |
   | 3 | A route admits `ai-proxy` and `ai-lakera-guard` configured together. |
   | 4 | Clean prompt — mock Lakera returns `flagged: false`; the original LLM 
response passes through to the client unchanged. An `error_log` assertion 
confirms the Lakera scan actually happened (the plugin is not a silent no-op). |
   | 5 | Flagged prompt — mock Lakera returns `flagged: true`; response is a 
completion-shape deny under the default 200 status. |
   | 6 | Route admits with `on_block.status: 400` override. |
   | 7 | Under the override, a flagged prompt returns the deny under 400. |
   
   The inline mock Lakera server on port 6724 in the test file's `http_config` 
preprocessor branches by request body content (substring `"kill"` triggers the 
flagged fixture). Fixtures under `t/fixtures/lakera/` mirror the real 
`/v2/guard` response shape (`project_id`, `policy_id`, `detector_id`, 
`detector_type`, `detected`, `message_id`).
   
   #### Which issue(s) this PR fixes:
   
   Partial implementation of #13291 (walking skeleton). Subsequent PRs will 
land the remaining capabilities — output-direction scanning, all four 
protocols, streaming response scan, alert mode, `fail_open`, secrets, 
`project_id`, `reveal_failure_categories`, and user documentation.
   
   Related follow-up issues filed during the design phase for cross-cutting 
concerns affecting all AI moderation plugins: #13352 (extend 
`extract_request_content` to cover Anthropic top-level `system:` and `tools[]`) 
and #13353 (shared "moderation-decided" ctx flag for sibling-plugin 
coordination). Both are out of scope for this PR — `ai-lakera-guard` inherits 
the gaps from #13352 and documents the limitation in user-facing docs (landed 
in a later slice).
   
   ### Checklist
   
   - [x] I have explained the need for this PR and the problem it solves
   - [x] I have explained the changes or the new features added to this PR
   - [x] I have added tests corresponding to this change
   - [ ] I have updated the documentation to reflect this change
   - [x] I have verified that this change is backward compatible (If not, 
please discuss on the [APISIX mailing 
list](https://github.com/apache/apisix/tree/master#community) first)
   
   User documentation (`docs/en/latest/plugins/ai-lakera-guard.md`) will land 
in a later slice of this series once the user-facing surface — alert mode, 
`fail_open`, secrets, optional fields — is fully wired up. The plugin is fully 
backward compatible: it's a new addition with no existing users to break.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to