mochengqian opened a new issue, #956:
URL: https://github.com/apache/dubbo-go-pixiu/issues/956

   ## Background
   
   Follow-up to #932 (issue #905 step 5), which moved LLM proxy cooldown state 
out of `Endpoint.Metadata` into a process-wide `cooldownStore` keyed by 
`cooldownKey`.
   
   In `pkg/filter/llm/proxy/filter.go`:
   
   ```go
   cooldownKey struct {
        clusterName     string
        endpointID      string
        endpointAddress string
        credentialHash  string
   }
   
   func endpointCredentialHash(endpoint *model.Endpoint) string {
        if endpoint == nil || endpoint.LLMMeta == nil || 
endpoint.LLMMeta.APIKey == "" {
                return ""
        }
        sum := sha256.Sum256([]byte(endpoint.LLMMeta.APIKey))
        return fmt.Sprintf("%x", sum)
   }
   ```
   
   ## Problem
   
   Every LLM request flows through `endpointInCooldown` / 
`markEndpointCooldown` -> `newCooldownKey` -> `endpointCredentialHash` at least 
once (more on the retry/fallback path). Each call:
   
   - recomputes sha256 of the API key, and
   - `fmt.Sprintf("%x", sum)` allocates a fresh 64-byte hex string that escapes 
to the heap.
   
   The hex string is pure overhead — it is only used as part of a comparable 
map key.
   
   ## Goal
   
   Remove the per-request hex-string allocation.
   
   ## Suggested approach
   
   Use the raw `[32]byte` sha256 output directly as the key component (it is a 
comparable array and is valid in a struct map key):
   
   ```go
   cooldownKey struct {
        clusterName     string
        endpointID      string
        endpointAddress string
        credentialHash  [32]byte
   }
   
   func endpointCredentialHash(endpoint *model.Endpoint) [32]byte {
        if endpoint == nil || endpoint.LLMMeta == nil || 
endpoint.LLMMeta.APIKey == "" {
                return [32]byte{}
        }
        return sha256.Sum256([]byte(endpoint.LLMMeta.APIKey))
   }
   ```
   
   This drops the hex encoding + heap allocation. sha256 itself stays per-call 
(nanoseconds for short keys); the allocation was the dominant cost. Caching the 
hash on the endpoint is explicitly **not** proposed — it would pollute the 
model struct and complicate snapshot cloning for a ~200ns win.
   
   ## Test plan
   
   - [ ] `BenchmarkCooldown_EndpointInCooldown` over ~100 LLM endpoints, 
before/after — expect at least one fewer alloc/op.
   - [ ] Update existing cooldown tests for the `[32]byte` zero value.
   - [ ] `go test -race ./pkg/filter/llm/proxy/...`
   
   ## Files
   
   - `pkg/filter/llm/proxy/filter.go`
   - `pkg/filter/llm/proxy/filter_test.go`
   
   ## Scope note
   
   `cooldownKey` is unexported; this is a purely internal change with no 
upstream-visible API impact. Independent of #939 (inject cooldownStore) and 
#943 (container/list LRU) — can land in any order.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to