Copilot commented on code in PR #12867:
URL: https://github.com/apache/apisix/pull/12867#discussion_r2726744450
##########
apisix/discovery/nacos/init.lua:
##########
@@ -350,10 +356,13 @@ local function fetch_from_host(base_uri, username,
password, services)
for key, nodes in pairs(nodes_cache) do
local content = core.json.encode(nodes)
nacos_dict:set(key, content)
+ local nodes_version = ngx.crc32_long(content)
Review Comment:
`nodes_version` is derived from `core.json.encode(nodes)` output, but
`cjson.encode` does not guarantee stable object key ordering (see
apisix/core/json.lua:38-41). This can cause the CRC to change even when the
nodes data is logically unchanged, defeating the version-controlled LRU cache
and re-triggering JSON decoding. Consider generating the version from a stable
representation (e.g., `core.json.stably_encode` and/or sorting `nodes` before
encoding) so the version only changes when the data changes.
```suggestion
local stable_content = core.json.stably_encode(nodes)
local nodes_version = ngx.crc32_long(stable_content)
```
##########
apisix/discovery/nacos/init.lua:
##########
@@ -32,13 +32,19 @@ local string_sub = string.sub
local str_byte = string.byte
local str_find = core.string.find
local log = core.log
+local process = require("ngx.process")
local default_weight
local nacos_dict = ngx.shared.nacos --key: namespace_id.group_name.service_name
if not nacos_dict then
error("lua_shared_dict \"nacos\" not configured")
end
+local nodes_lrucache = core.lrucache.new({
+ ttl = 300, -- 5分钟TTL
+ count = 1024
Review Comment:
The inline comment for the LRU cache TTL is in Chinese ("5分钟TTL"), while the
surrounding codebase uses English comments. Please change it to an English
comment (or remove it) to match repository conventions (e.g.,
apisix/discovery/kubernetes/init.lua:39-42).
##########
apisix/discovery/nacos/init.lua:
##########
@@ -419,8 +439,12 @@ function _M.init_worker()
log.info('fetch_interval:', fetch_interval)
access_key = local_conf.discovery.nacos.access_key
secret_key = local_conf.discovery.nacos.secret_key
- ngx_timer_at(0, fetch_full_registry)
- ngx_timer_every(fetch_interval, fetch_full_registry)
+
+ -- Only privileged agent should fetch nacos registry
+ if process.type() == "privileged agent" then
+ ngx_timer_at(0, fetch_full_registry)
+ ngx_timer_every(fetch_interval, fetch_full_registry)
+ end
Review Comment:
The behavioral change to fetch the registry only in the privileged agent and
the new version-controlled LRU caching path are not covered by existing nacos
discovery tests. Please add/extend tests (see t/discovery/nacos*.t) to assert
that only one process performs nacos fetches and that workers can still resolve
nodes across reloads / when cache versioning is in effect.
##########
apisix/discovery/nacos/init.lua:
##########
@@ -350,10 +356,13 @@ local function fetch_from_host(base_uri, username,
password, services)
for key, nodes in pairs(nodes_cache) do
local content = core.json.encode(nodes)
nacos_dict:set(key, content)
+ local nodes_version = ngx.crc32_long(content)
+ nacos_dict:set(key .. "#version", nodes_version)
Review Comment:
Writes to the shared dict ignore return values, which can leave `key` and
`key#version` inconsistent under shm pressure (e.g., content set fails but
version set succeeds, or vice versa). This can cause workers to treat a service
as missing or read mismatched data. Please follow the established pattern used
in other discovery modules (e.g., apisix/discovery/tars/init.lua:78-93): use
`safe_set` (or check `set` return values), and on failure, avoid publishing the
version without content (and/or roll back by deleting the companion key).
```suggestion
local ok, err = nacos_dict:safe_set(key, content)
if not ok then
log.error("failed to set nacos discovery content for key: ", key,
", error: ", err)
-- avoid leaving an inconsistent state
nacos_dict:delete(key)
nacos_dict:delete(key .. "#version")
else
local nodes_version = ngx.crc32_long(content)
local ok_ver, err_ver = nacos_dict:safe_set(key .. "#version",
nodes_version)
if not ok_ver then
log.error("failed to set nacos discovery version for key: ",
key,
", error: ", err_ver)
-- rollback content to keep key and key#version consistent
nacos_dict:delete(key)
nacos_dict:delete(key .. "#version")
end
end
```
##########
apisix/discovery/nacos/init.lua:
##########
@@ -396,19 +405,30 @@ local function fetch_full_registry(premature)
end
+local function load_nodes_from_dict(key)
+ local value = nacos_dict:get(key)
+ if not value then
+ core.log.error("nacos service not found: ", key)
+ return nil
+ end
+ local nodes = core.json.decode(value)
+ return nodes
+end
+
function _M.nodes(service_name, discovery_args)
local namespace_id = discovery_args and
discovery_args.namespace_id or default_namespace_id
local group_name = discovery_args
and discovery_args.group_name or default_group_name
local key = get_key(namespace_id, group_name, service_name)
- local value = nacos_dict:get(key)
- if not value then
- core.log.error("nacos service not found: ", service_name)
+
+ local nodes_version = nacos_dict:get(key .. "#version")
+ if not nodes_version then
+ core.log.error("nacos service version not found: ", key)
return nil
Review Comment:
`_M.nodes` now hard-requires `key#version` and returns nil if it's missing.
On hot reload/upgrade, the shared dict can still contain the previous `key`
value without a corresponding `#version` entry, causing transient 503s until
the privileged agent refreshes. It can also happen if the version write fails.
To keep behavior backward compatible and more robust, consider falling back to
reading/decoding `key` when `#version` is absent (and optionally backfilling
the version).
```suggestion
-- Fallback for legacy / transient states where only `key` exists
core.log.warn("nacos service version not found, fallback to legacy
key: ", key)
local nodes = load_nodes_from_dict(key)
if not nodes then
-- No data under legacy key either; keep existing behavior
return nil
end
-- Backfill a default version so subsequent lookups are consistent
nodes_version = 0
local ok, err = nacos_dict:safe_set(key .. "#version", nodes_version)
if not ok then
core.log.warn("failed to backfill nacos service version for key:
",
key, ", err: ", err)
end
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]