Falven opened a new issue, #12662:
URL: https://github.com/apache/apisix/issues/12662
### Current Behavior
When running APISIX 3.13.0 in file-driven standalone mode
(deployment.role=data_plane, config_provider=yaml), the `/status/ready` health
check endpoint always returns HTTP 503 with error "worker id: X has not
received configuration", despite:
- Routes working correctly
- Configuration being successfully loaded from apisix.yaml
- All workers functioning normally
Example error response:
```json
{"error":"worker id: 0 has not received configuration","status":"error"}
```
### Expected Behavior
The `/status/ready` endpoint should return HTTP 200 with `{"status":"ok"}`
when all workers have successfully loaded the configuration from the YAML file.
### Error Logs
```
2025/01/10 00:41:47 [warn] 33#33: *3 [lua] init.lua:1003: status_ready():
worker id: 0 has not received configuration, context: ngx.timer
```
### Steps to Reproduce
1. Configure APISIX in file-driven standalone mode:
```yaml
# config.yaml
deployment:
role: data_plane
role_data_plane:
config_provider: yaml
apisix:
enable_admin: false
```
2. Create a valid apisix.yaml with routes
3. Start APISIX
4. Test the health check endpoint:
```bash
curl http://127.0.0.1:7085/status/ready
```
5. Observe HTTP 503 error despite routes working correctly
### Environment
- APISIX version: 3.13.0
- Operating System: Docker (apache/apisix:3.13.0-debian)
- OpenResty / Nginx version: From official image
- Deployment mode: data_plane with yaml config_provider
### Root Cause Analysis (UPDATED)
After extensive debugging with added logging, I've identified the actual
root cause. The issue occurs when the configuration file is rendered **before**
APISIX starts (common in container environments):
**Timing Issue:**
1. Configuration file (`apisix.yaml`) is created by an entrypoint script
before APISIX starts
2. Master process reads the file during startup, setting `apisix_yaml_mtime`
global variable
3. Workers initialize and call `sync_status_to_shdict(false)` marking
themselves as **unhealthy**
4. Workers create timers that call `read_apisix_config()` every second
5. **Critical bug**: `read_apisix_config()` checks if file mtime has changed:
```lua
if apisix_yaml_mtime == last_modification_time then
return -- File hasn't changed, return early
end
```
6. Because the file was rendered before startup, the mtime never changes
7. `update_config()` is **never called** by workers
8. Workers remain marked as unhealthy forever
9. `/status/ready` endpoint fails perpetually
**Debug Evidence:**
Adding logging to `config_yaml.lua` confirmed:
- `update_config()` is only called once by the master process (PID 1) during
startup
- Master's call to `sync_status_to_shdict(true)` does nothing because it
checks `if process.type() ~= "worker" then return end`
- All 12 workers successfully create timers
- Timers fire every second but return early due to unchanged mtime
- Workers never call `update_config()`, thus never call
`sync_status_to_shdict(true)`
### Relevant Code
**apisix/core/config_yaml.lua** - Lines ~565-585:
```lua
function _M.init_worker()
sync_status_to_shdict(false) -- Mark worker as unhealthy
if is_use_admin_api() then
apisix_yaml = {}
apisix_yaml_mtime = 0
return true
end
-- sync data in each non-master process
ngx.timer.every(1, read_apisix_config) -- Timer created but never calls
update_config
return true
end
```
**apisix/core/config_yaml.lua** - Lines ~150-165:
```lua
local function read_apisix_config(premature, pre_mtime)
if premature then
return
end
local attributes, err = lfs.attributes(config_file.path)
if not attributes then
log.error("failed to fetch ", config_file.path, " attributes: ", err)
return
end
local last_modification_time = attributes.modification
if apisix_yaml_mtime == last_modification_time then
return -- BUG: Returns early, never calls update_config()
end
-- This code is never reached if file hasn't changed since startup
local config_new, err = config_file:parse()
if err then
log.error("failed to parse the content of file ", config_file.path,
": ", err)
return
end
update_config(config_new, last_modification_time)
log.warn("config file ", config_file.path, " reloaded.")
end
```
**apisix/core/config_yaml.lua** - Lines ~136-148:
```lua
local function sync_status_to_shdict(status)
if process.type() ~= "worker" then
return -- Master process calls are ignored
end
local dict_name = "status-report"
local key = worker_id()
local shdict = ngx.shared[dict_name]
local _, err = shdict:set(key, status)
if err then
log.error("failed to ", status and "set" or "clear",
" shdict " .. dict_name .. ", key=" .. key, ", err: ", err)
end
end
```
### Proposed Solution
In `init_worker()`, immediately call `update_config()` after creating the
timer to mark the worker as healthy:
```lua
function _M.init_worker()
sync_status_to_shdict(false)
if is_use_admin_api() then
apisix_yaml = {}
apisix_yaml_mtime = 0
return true
end
-- sync data in each non-master process
ngx.timer.every(1, read_apisix_config)
-- FIX: Mark worker as healthy immediately if config already loaded
if apisix_yaml then
update_config(apisix_yaml, apisix_yaml_mtime)
end
return true
end
```
This ensures workers are marked healthy on initialization, before the timer
even fires. The timer will still update configuration when the file changes.
### Verified Fix
I patched the code in a running container and confirmed:
- All 12 workers call `update_config()` in `init_worker_by_lua*` context
- `/status/ready` returns `{"status":"ok"}` with HTTP 200
- Docker health check passes (container shows "healthy" status)
- Routes continue working correctly
### Impact
This bug affects production deployments using:
- Kubernetes readiness probes with file-driven standalone mode
- Docker health checks
- Load balancers that depend on `/status/ready` endpoint
- Any container orchestration that renders config files before starting
APISIX
The health check always fails, preventing proper deployment orchestration,
even though APISIX is functioning correctly and serving traffic.
### Additional Context
The bug is specific to the timing of when the configuration file is created
relative to APISIX startup. If the file is created and never modified, workers
never get marked as healthy. This is a common pattern in containerized
deployments where entrypoint scripts render configuration from environment
variables before starting the main process.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]