jizhuozhi opened a new issue, #12937:
URL: https://github.com/apache/apisix/issues/12937

   ### Current Behavior
   
   ## Bug Description
   
   There is a bug in the Consul service discovery implementation where a single 
malformed node (missing `Service` field) can cause the entire service to become 
unavailable.
   
   ## Root Cause
   
   The `:: CONTINUE ::` label is placed **outside** the inner `for` loop (line 
648), but the `goto CONTINUE` statement is used **inside** the inner loop (line 
578). This causes the goto statement to skip not only the current node but also:
   - All remaining nodes in the result
   - The sorting logic
   - The critical `up_services[service_name] = nodes` assignment
   
   ## Affected Code
   
   File: `apisix/discovery/consul/init.lua`
   
   ```lua
   for service_name, _ in pairs(catalog_res.body) do
       -- ... get nodes from consul ...
       
       for _, node in ipairs(result.body) do
           local service = node.Service
           if not service then
               goto CONTINUE  -- Line 578: BUG! This jumps outside the inner 
loop
           end
           
           -- ... process node ...
       end  -- Inner loop ends here (line 633)
       
       -- Sorting logic
       if nodes then
           -- ... sort nodes ...
       end
       up_services[service_name] = nodes
   end  -- Outer loop ends here
   
   :: CONTINUE ::  -- Line 648: Label is HERE, outside both loops!
   ```
   
   ## Impact
   
   When a single node in the Consul response has a missing `Service` field:
   1. The `goto CONTINUE` is triggered
   2. It jumps to line 648, **skipping the entire remaining processing for that 
service**
   3. The `up_services[service_name] = nodes` assignment is never executed
   4. APISIX does not update the node list for that service
   5. If this happens during startup or after a previous service removal, the 
service becomes completely unavailable
   
   ## Proposed Fix
   
   Move the `:: CONTINUE ::` label inside the inner loop, or rename the labels 
to avoid confusion:
   
   ```lua
   for service_name, _ in pairs(catalog_res.body) do
       -- ... get nodes from consul ...
       
       for _, node in ipairs(result.body) do
           local service = node.Service
           if not service then
               goto CONTINUE_NODE  -- Skip only this node
           end
           
           -- ... process node ...
           
           :: CONTINUE_NODE ::  -- Label inside the inner loop
       end  -- Inner loop ends
       
       -- Sorting and assignment continue normally
       if nodes then
           -- ... sort nodes ...
       end
       up_services[service_name] = nodes
   end
   
   :: CONTINUE ::  -- Keep outer label for service-level skipping
   ```
   
   ## Additional Notes
   
   This bug has likely existed for a long time but may have gone unnoticed due 
to the rarity of malformed Consul responses. However, it represents a critical 
reliability issue as a single bad node entry can cause complete service 
unavailability.
   
   ### Expected Behavior
   
   When a single node is malformed, only that specific node should be skipped. 
Other healthy nodes should still be processed and added to `up_services`.
   
   ### Error Logs
   
   None
   
   ### Steps to Reproduce
   
   1. Register a service in Consul with multiple instances
   2. Somehow create a node entry with a missing `Service` field (e.g., through 
API manipulation or data corruption)
   3. APISIX will fail to discover ANY nodes for that service, even the healthy 
ones
   4. Requests to that service will return "no valid upstream node" errors
   
   ### Environment
   
   - APISIX version (run `apisix version`):
   - Operating system (run `uname -a`):
   - OpenResty / Nginx version (run `openresty -V` or `nginx -V`):
   - etcd version, if relevant (run `curl 
http://127.0.0.1:9090/v1/server_info`):
   - APISIX Dashboard version, if relevant:
   - Plugin runner version, for issues related to plugin runners:
   - LuaRocks version, for installation issues (run `luarocks --version`):
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to