nic-6443 opened a new pull request, #13598:
URL: https://github.com/apache/apisix/pull/13598

   ### Description
   
   The `ai-aliyun-content-moderation` plugin had two hot-path inefficiencies in 
the moderation path, and it re-moderated the entire conversation on every 
request. This PR fixes the performance issues and adds a moderation-scope 
option.
   
   **1. O(n²) → O(n) content chunking.** `content_moderation()` splits text 
into `*_check_length_limit`-sized chunks. It used `utf8.sub(content, index, 
...)`, which locates the `index`-th character by scanning from the start of the 
string on every chunk, making the loop O(n²) in content length. Replaced with a 
byte cursor + `utf8.offset(content, length_limit + 1, cur)` (scans only the 
current chunk window) + byte-based `string.sub`, giving O(n). Output is 
byte-identical to the previous chunking (verified for ASCII and multibyte text).
   
   **2. Faster RFC-3986 signing.** `url_encoding()` ran five separate Lua 
`string.gsub` passes over the percent-encoded chunk to escape the 
sub-delimiters `! ' ( ) *`. Profiling a ~2 KB chunk showed this single function 
at ~148 µs — 30–40× more than every other per-chunk operation 
(`json.encode`/`hmac_sha1`/`encode_args` are all ~4–5 µs). Replaced with one 
JIT-compiled `ngx.re.gsub(escaped, "[!'()*]", repl, "jo")` pass (~7 µs, same 
output).
   
   **3. Schema guard.** Added `minimum = 1` to `request_check_length_limit` and 
`response_check_length_limit`. A non-positive value made the chunking loop 
never advance (`utf8.offset(content, length_limit + 1, cur)` returns `cur`), an 
infinite loop. It is now rejected at config time, consistent with 
`stream_check_cache_size`.
   
   **4. `request_check_mode` (`last` | `all`).** Request moderation now targets 
user input only. By default (`last`) it moderates just the latest consecutive 
block of `user` messages (the newest user turn) instead of the whole 
conversation, so multi-turn requests no longer re-send the entire history to 
the moderation service every turn. `all` moderates every `user` message. Both 
modes ignore `system`/`assistant`/`tool` messages (the query moderation service 
is meant for user input). Note this changes the previous behavior, which 
moderated all messages regardless of role.
   
   #### Performance
   
   Single `user` message, `ai-proxy` + `ai-aliyun-content-moderation`, local 
mock moderation endpoint that always passes, `wrk -t1 -c1`. Baseline = 
`ai-proxy` only:
   
   | message size | baseline QPS | before QPS | after QPS | before→after |
   |---|---:|---:|---:|---:|
   | 64k  | 1928 | 71.8  | 162.4 | 2.3x |
   | 128k | 1377 | 30.3  | 84.6  | 2.8x |
   | 256k | 899  | 11.3  | 43.1  | 3.8x |
   | 512k | 508  | 3.75  | 21.9  | 5.9x |
   | 1M   | 266  | 1.10  | 11.2  | 10.1x |
   
   The `before→after` gain grows with size because the O(n²) term is removed; 
the single-pass encoding adds a roughly uniform ~2× on top.
   
   #### Tests
   
   Added cases to `t/plugin/ai-aliyun-content-moderation.t`: `last`/`all` mode 
behavior, role-awareness (non-user messages are skipped; a non-user last 
message means no user turn to check), multi-chunk detection across a multibyte 
boundary, and schema rejection of `request_check_length_limit: 0`.
   
   ### Checklist
   
   - [x] I have explained the need for this PR and the problem it solves
   - [x] I have added the relevant tests
   - [x] I have updated the documentation (en + zh)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to