nic-6443 opened a new pull request, #13598: URL: https://github.com/apache/apisix/pull/13598
### Description The `ai-aliyun-content-moderation` plugin had two hot-path inefficiencies in the moderation path, and it re-moderated the entire conversation on every request. This PR fixes the performance issues and adds a moderation-scope option. **1. O(n²) → O(n) content chunking.** `content_moderation()` splits text into `*_check_length_limit`-sized chunks. It used `utf8.sub(content, index, ...)`, which locates the `index`-th character by scanning from the start of the string on every chunk, making the loop O(n²) in content length. Replaced with a byte cursor + `utf8.offset(content, length_limit + 1, cur)` (scans only the current chunk window) + byte-based `string.sub`, giving O(n). Output is byte-identical to the previous chunking (verified for ASCII and multibyte text). **2. Faster RFC-3986 signing.** `url_encoding()` ran five separate Lua `string.gsub` passes over the percent-encoded chunk to escape the sub-delimiters `! ' ( ) *`. Profiling a ~2 KB chunk showed this single function at ~148 µs — 30–40× more than every other per-chunk operation (`json.encode`/`hmac_sha1`/`encode_args` are all ~4–5 µs). Replaced with one JIT-compiled `ngx.re.gsub(escaped, "[!'()*]", repl, "jo")` pass (~7 µs, same output). **3. Schema guard.** Added `minimum = 1` to `request_check_length_limit` and `response_check_length_limit`. A non-positive value made the chunking loop never advance (`utf8.offset(content, length_limit + 1, cur)` returns `cur`), an infinite loop. It is now rejected at config time, consistent with `stream_check_cache_size`. **4. `request_check_mode` (`last` | `all`).** Request moderation now targets user input only. By default (`last`) it moderates just the latest consecutive block of `user` messages (the newest user turn) instead of the whole conversation, so multi-turn requests no longer re-send the entire history to the moderation service every turn. `all` moderates every `user` message. Both modes ignore `system`/`assistant`/`tool` messages (the query moderation service is meant for user input). Note this changes the previous behavior, which moderated all messages regardless of role. #### Performance Single `user` message, `ai-proxy` + `ai-aliyun-content-moderation`, local mock moderation endpoint that always passes, `wrk -t1 -c1`. Baseline = `ai-proxy` only: | message size | baseline QPS | before QPS | after QPS | before→after | |---|---:|---:|---:|---:| | 64k | 1928 | 71.8 | 162.4 | 2.3x | | 128k | 1377 | 30.3 | 84.6 | 2.8x | | 256k | 899 | 11.3 | 43.1 | 3.8x | | 512k | 508 | 3.75 | 21.9 | 5.9x | | 1M | 266 | 1.10 | 11.2 | 10.1x | The `before→after` gain grows with size because the O(n²) term is removed; the single-pass encoding adds a roughly uniform ~2× on top. #### Tests Added cases to `t/plugin/ai-aliyun-content-moderation.t`: `last`/`all` mode behavior, role-awareness (non-user messages are skipped; a non-user last message means no user turn to check), multi-chunk detection across a multibyte boundary, and schema rejection of `request_check_length_limit: 0`. ### Checklist - [x] I have explained the need for this PR and the problem it solves - [x] I have added the relevant tests - [x] I have updated the documentation (en + zh) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
