mochengqian commented on issue #949:
URL: https://github.com/apache/dubbo-go-pixiu/issues/949#issuecomment-4642990871

   Picking this back up to move the design conversation forward (keeping it 
design-only — no implementation PR until we agree direction, per the issue 
header).
   
   ## Position on the open questions
   
   **1. Does the static-YAML path have legitimate duplicate-ID use cases?**
   I can't find one. ID is identity; two static entries with the same ID are 
two configs claiming to be the same runtime instance, which is incoherent. 
Blue/green and sister deployments are *different* instances and should carry 
different IDs (or be split across clusters) — they don't need ID reuse. So I'd 
treat static duplicate-ID as a hard error once the deprecation window closes.
   
   **2. Empty-ID hash-collision case — should hard-fail apply?**
   No, and I think this is the important nuance. When the caller supplies no 
ID, it is explicitly *not* claiming identity — it's saying "give me a runtime 
ID." Two empty-ID payloads that hash-collide but differ in content are two 
legitimate distinct endpoints (the static-assemble suffix rule already handles 
this correctly). Hard-failing here would punish callers for not claiming an 
identity they were never required to claim. **So the strict rule should apply 
only to the explicit-ID paths** (static YAML `id:` and dynamic `SetEndpoint` 
with an explicit ID), and the empty-ID hash path should keep its current 
suffix-append tolerance unchanged.
   
   That asymmetry matters because it means "strict unique IDs" is really 
"strict unique *explicit* IDs" — which is much safer to flip on by default, 
since it only rejects configs where the operator actually typed a colliding 
identity.
   
   **3. Detection before production**
   The Phase-1 metric below is the answer here — it lets operators see 
collisions in staging without changing behavior. No integration test can catch 
an operator's specific misconfig; observability is the right tool.
   
   **4. Version targets**
   Defer to maintainers, but the phased shape (observe → opt-in → default-on) 
is sound. The metric (Phase 1) is the only piece that's both immediately useful 
and zero-risk, so I'd start there regardless of when the strict flag lands.
   
   ## Proposed Phase 1 (the contributor-sized, zero-behavior-change starting 
point)
   
   Just the observability, no enforcement:
   
   - Add a counter `pixiu_endpoint_id_collision_total{cluster, source}` where 
`source` ∈ `{static, dynamic_explicit, dynamic_hash}`, incremented wherever a 
collision currently triggers a suffix/replace (the existing WARN sites in 
`assembleClusterEndpoints` and `resolveSetEndpointSlotByHash`).
   - Keep the existing WARN logs; add a one-line deprecation note pointing at 
this issue.
   
   This is observable, reversible, and commits us to nothing. If maintainers 
are OK with the metric (and with OTel as the mechanism, consistent with the 
snapshot-publish metrics added in #962), I'm happy to take Phase 1 as a small 
PR and leave the strict-flag phases (2/3) for separate discussion once we have 
collision data from real deployments.
   
   Does the explicit-vs-empty-ID asymmetry in (2) match how others read the 
dedup contract? If so I'll write up Phase 1 against the metric mechanism in 
#962.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to