janiussyafiq opened a new pull request, #13355: URL: https://github.com/apache/apisix/pull/13355
### Description This PR introduces `ai-lakera-guard`, a new AI security plugin that scans LLM prompts and responses for prompt injection, jailbreaks, PII, profanity, and unknown-link exfiltration via Lakera Guard's `/v2/guard` API. It fills the gap between the existing regex-based `ai-prompt-guard` and the general-purpose toxicity scorers (`ai-aws-content-moderation`, `ai-aliyun-content-moderation`) by adding a prompt-security-native commercial vendor integration. Kong already ships an analogous `ai-lakera-guard`; LiteLLM supports it as a guardrail backend. This is the **walking skeleton** of a multi-PR series tracked under #13291. It lands the foundation and the request-side block path on the `openai-chat` protocol; output-direction scanning, the other three protocols, streaming, alert mode, `fail_open`, `$secret://` integration, `project_id` forwarding, `reveal_failure_categories`, and user documentation will follow in subsequent PRs. **Plugin priority:** 1028 — slots into the existing AI plugin chain just below `ai-aliyun-content-moderation` (1029). **Modules added:** | File | Responsibility | |---|---| | `apisix/plugins/ai-lakera-guard.lua` | Phase entry points (`_M.access`, plus `_M.lua_body_filter` and `_M.log` no-op stubs for later slices), plugin metadata. Uses `apisix.plugins.ai-protocols.<proto>.build_deny_response` so SDK clients see well-formed completion-shape deny bodies instead of free-form HTTP error strings. | | `apisix/plugins/ai-lakera-guard/schema.lua` | Full JSON Schema for the plugin (every field that later slices will exercise lands here, so subsequent PRs add behavior rather than new fields). `encrypt_fields = ["endpoint.api_key"]` enables APISIX's at-rest encryption and `$secret://` resolution. | | `apisix/plugins/ai-lakera-guard/client.lua` | HTTP client for `/v2/guard` with connection-pool reuse via `resty.http` + `set_keepalive`, so each scan doesn't burn a fresh TLS handshake to `api.lakera.ai`. Narrow exported surface: `scan(conf, messages, project_id) → (flagged, detector_types, err)`. | **Key design choices, with rationale documented in the proposal/issue:** - **Default `on_block.status = 200`** with a completion-shape deny body — matching `ai-aliyun-content-moderation`'s `deny_code = 200`. Provider SDKs raise on 4xx before parsing the body, so a completion-shape deny under 4xx would be unreachable to the calling application. Operators wanting 4xx can override. - **Default `response_buffer_size = 128`** bytes — matches `ai-aliyun-content-moderation`'s `stream_check_cache_size` and is in the same neighborhood as Kong's `ai-lakera-guard` default of 100 bytes. (Used in the streaming slice — schema lands here so the value is consistent across slices.) - **`fail_open` defaults to `false`** (safety-first) — matches the project tracking issue's design. (Behavior wired in a later slice; schema lands here.) - **Full schema upfront** — every field that subsequent slices need is in the schema as of this PR. This means slices S2 onward modify only behavior, not the schema shape, and reduces churn for operators evaluating the plugin while it's incrementally built out. **Tests** (`t/plugin/ai-lakera-guard.t`): | TEST | Behavior | |---|---| | 1 | Schema accepts a minimal valid config (only `endpoint.api_key`) via the Admin API. | | 2 | Schema rejects a missing `endpoint.api_key` with 400 and the standard plugin-validation error message. | | 3 | A route admits `ai-proxy` and `ai-lakera-guard` configured together. | | 4 | Clean prompt — mock Lakera returns `flagged: false`; the original LLM response passes through to the client unchanged. An `error_log` assertion confirms the Lakera scan actually happened (the plugin is not a silent no-op). | | 5 | Flagged prompt — mock Lakera returns `flagged: true`; response is a completion-shape deny under the default 200 status. | | 6 | Route admits with `on_block.status: 400` override. | | 7 | Under the override, a flagged prompt returns the deny under 400. | The inline mock Lakera server on port 6724 in the test file's `http_config` preprocessor branches by request body content (substring `"kill"` triggers the flagged fixture). Fixtures under `t/fixtures/lakera/` mirror the real `/v2/guard` response shape (`project_id`, `policy_id`, `detector_id`, `detector_type`, `detected`, `message_id`). #### Which issue(s) this PR fixes: Partial implementation of #13291 (walking skeleton). Subsequent PRs will land the remaining capabilities — output-direction scanning, all four protocols, streaming response scan, alert mode, `fail_open`, secrets, `project_id`, `reveal_failure_categories`, and user documentation. Related follow-up issues filed during the design phase for cross-cutting concerns affecting all AI moderation plugins: #13352 (extend `extract_request_content` to cover Anthropic top-level `system:` and `tools[]`) and #13353 (shared "moderation-decided" ctx flag for sibling-plugin coordination). Both are out of scope for this PR — `ai-lakera-guard` inherits the gaps from #13352 and documents the limitation in user-facing docs (landed in a later slice). ### Checklist - [x] I have explained the need for this PR and the problem it solves - [x] I have explained the changes or the new features added to this PR - [x] I have added tests corresponding to this change - [ ] I have updated the documentation to reflect this change - [x] I have verified that this change is backward compatible (If not, please discuss on the [APISIX mailing list](https://github.com/apache/apisix/tree/master#community) first) User documentation (`docs/en/latest/plugins/ai-lakera-guard.md`) will land in a later slice of this series once the user-facing surface — alert mode, `fail_open`, secrets, optional fields — is fully wired up. The plugin is fully backward compatible: it's a new addition with no existing users to break. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
