XnLemon opened a new pull request, #3369:
URL: https://github.com/apache/dubbo-go/pull/3369

   ### Description
   
   Fixes #3352
   
   ---
   
   #### Background / 背景
   
   The [#2534 refactor](https://github.com/apache/dubbo-go/pull/2534) 
centralized metadata logic and settled on an application-level metadata path 
where `MetadataReport` handles app-level `MetadataInfo` and service-app 
mapping. That refactor was correct in intent, but it left the **selection 
semantics of `MetadataReport` under-specified** for multi-registry and 
multi-instance deployments.
   
   Concretely, when a process connects to more than one registry (e.g. Nacos + 
Zookeeper), each registry gets its own `MetadataReport` instance, keyed by 
`registryId` in the `instances` map. The pre-existing code had four independent 
correctness gaps in how it chose which report to use.
   
   #2534 重构将 metadata 逻辑集中化,以应用级 `MetadataInfo` 和 service-app mapping 
为核心路径。重构方向正确,但在**多注册中心和多实例部署场景下,`MetadataReport` 的选择语义存在空白**。
   
   具体来说,当进程同时连接多个注册中心(如 Nacos + Zookeeper)时,每个注册中心会在 `instances` map 中保有独立的 
`MetadataReport` 实例。原有代码在选择使用哪个 report 时存在以下四个独立的正确性问题。
   
   ---
   
   #### Problems Fixed / 修复的问题
   
   **1. `GetMetadataReport()` returned a non-deterministic report / 返回结果不确定**
   
   Go map iteration order is randomized. `GetMetadataReport()` returned 
whichever entry happened to come first — a different report on every call in 
some runs, making any downstream behavior non-reproducible.
   
   Fix: prefer the `"default"` key; fall back to the lexicographically first 
`registryId` so the result is always stable.
   
   Go map 迭代顺序随机,`GetMetadataReport()` 每次可能返回不同的 report。修复:优先返回 `"default"` 
键;不存在时退回到字典序最小的 `registryId`,确保结果稳定。
   
   ---
   
   **2. `GetMetadataReportByRegistry()` silently fell back to the wrong 
registry's report / 静默返回错误注册中心的 report**
   
   When a caller passed a specific `registryId` that was not registered, the 
function silently fell back to `GetMetadataReport()` and returned an unrelated 
registry's report. In `GetMetadataFromMetadataReport`, this meant remote 
metadata could be fetched from the wrong backend with no error surfaced.
   
   Fix: return `nil` + `Warnf` when a specific registryId is not found. The 
existing nil-guard in `GetMetadataFromMetadataReport` already surfaces this as 
an actionable error.
   
   调用方传入未注册的 `registryId` 时,函数静默退回到其他注册中心的 report,且无任何报错。修复:找不到时返回 `nil` + 
`Warnf`,由调用方的 nil 检查给出明确错误。
   
   ---
   
   **3. `ServiceNameMapping.Remove()` only returned the last error / 只保留最后一个错误**
   
   `Remove()` fanned out to all reports but used a `lastErr` accumulator 
overwritten on each iteration. If the first report failed and the second 
succeeded, the error was silently discarded and the caller saw `nil`, leaving a 
stale mapping in the first registry.
   
   Fix: collect all errors with `errors.Join` so the caller receives the full 
failure picture.
   
   `Remove()` 向所有 report 扇出,但 `lastErr` 在每次循环中被覆盖。若第一个 report 
失败、第二个成功,错误被静默丢弃。修复:使用 `errors.Join` 收集全部错误。
   
   ---
   
   **4. Revision calculation and remote metadata fetch used cross-registry data 
/ Revision 计算和远端 metadata 拉取使用了跨注册中心数据**
   
   The revision customizers called the global `GetMetadataService()` which 
merges services across all registries, so two instances on different registries 
could compute the same revision even though their actual service sets differed 
— breaking consumer-side change detection.
   
   The listener cache in `GetMetadataInfo()` was keyed by `revision` alone. Two 
registries producing the same revision string would collide, and the second 
registry's listener would silently receive the first registry's cached metadata.
   
   Fix: thread `registryId` through `createInstance()`, both revision 
customizers, `GetMetadataFromMetadataReport()`, 
`NewServiceInstancesChangedListener()`, and `GetMetadataInfo()`. Change the 
cache key to `registryId + ":" + revision`.
   
   Revision customizer 通过全局 `GetMetadataService()` 
获取服务列表,合并了所有注册中心的数据,导致不同注册中心的实例可能计算出相同的 revision。监听器的 metadata 缓存仅以 `revision` 
为键,多注册中心场景下会发生缓存污染。修复:将 `registryId` 贯穿传递到相关链路,缓存键改为 `registryId + ":" + 
revision`。
   
   ---
   
   #### Changes / 改动内容
   
   | File | Change |
   |---|---|
   | `metadata/report_instance.go` | `GetMetadataReport()` deterministic; 
`GetMetadataReportByRegistry()` returns nil for unknown id; add 
`ClearMetadataReportInstances()` for test isolation |
   | `metadata/client.go` | `GetMetadataFromMetadataReport()` takes 
`registryId`, routes to correct per-registry report |
   | `metadata/mapping/metadata/service_name_mapping.go` | `Remove()` collects 
all errors with `errors.Join` |
   | `registry/servicediscovery/service_instances_changed_listener_impl.go` | 
`GetMetadataInfo()` and `NewServiceInstancesChangedListener()` take 
`registryId`; cache key scoped to `registryId + ":" + revision` |
   | `registry/servicediscovery/service_discovery_registry.go` | 
`createInstance()` injects `registryId` into instance metadata; `SubscribeURL` 
passes `registryId` to listener |
   | `registry/servicediscovery/customizer/service_revision_customizer.go` | 
Read per-registry `MetadataInfo`; upgrade missing-registryId log to `Errorf` 
with accurate consequence description |
   | 
`registry/servicediscovery/customizer/metadata_service_url_params_customizer.go`
 | Restore TODO with explanation — per-registry MetadataService URL support is 
tracked separately |
   
   ---
   
   #### Tests Added / 新增测试
   
   - `TestGetMetadataReportIsDeterministic` — 20 iterations verify stability 
under Go's randomized map iteration
   - `TestGetMetadataReportByRegistry` / 
`TestGetMetadataReportByRegistryFallsBackDeterministically` — hit, miss, and 
empty-string paths
   - `TestServiceNameMappingRemoveFansOutToAllReports` — both reports called on 
success
   - `TestServiceNameMappingRemoveCollectsAllErrors` — both errors visible via 
`errors.Is` in joined return
   - `TestServiceNameMappingRemoveContinuesAfterPartialFailure` — loop does not 
short-circuit on first failure
   - `TestExportedRevisionIsRegistryScoped` / 
`TestSubscribedRevisionIsRegistryScoped` — different registries produce 
different revisions
   - `TestExportedRevisionMissingRegistryIdYieldsZero` / 
`TestSubscribedRevisionMissingRegistryIdYieldsZero` — missing `RegistryIdKey` 
yields revision `"0"`, does not borrow another registry's data
   - `TestGetMetadataFromMetadataReport` (rewritten) — specific registryId 
routes to correct report; unknown registryId returns error, not wrong report
   - `TestListenerUsesRegistryIdToFetchRemoteMetadata` — end-to-end: registryId 
flows from `NewServiceInstancesChangedListener` through `OnEvent` → 
`GetMetadataInfo` → `GetMetadataFromMetadataReport` → correct mock report called
   
   ---
   
   #### TODO / 本 PR 还未做完
   
   Multi-instance metadata service URL alignment 
(`metadata_service_url_params_customizer.go`): `GetMetadataService()` still 
returns a global singleton so all instances across registries receive the same 
metadata service URL.
   
   多实例 metadata service URL 对齐:`GetMetadataService()` 仍返回全局单例,所有注册中心实例获得同一 
metadata service URL。
   
   ---
   
   ### Checklist
   - [X] I confirm the target branch is `develop`
   - [X] Code has passed local testing
   - [X] I have added tests that prove my fix is effective or that my feature 
works
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to