nic-6443 opened a new pull request, #13513: URL: https://github.com/apache/apisix/pull/13513
### Description A single malformed entry in the consul health API response (an entry without the `Service` field, e.g. from an agent on a reclaimed cloud instance) currently wipes out the whole service. The per-node check in `fetch_services_from_server()` does `goto CONTINUE`, but the `::CONTINUE::` label sits outside the node loop, so one bad entry skips all the remaining nodes, the sort, and the `up_services[key] = nodes` assignment. If the malformed entry comes first, the service ends up with no nodes, `update_all_services()` deletes it from the shared dict, and requests start failing with "no valid upstream node". The fix makes the skip local to the node loop (`goto CONTINUE_NODE` with the label at the end of the loop body) and logs a warning for the skipped entry. After the change, a malformed entry only drops that one node and the remaining healthy nodes keep serving traffic. Added a regression test (`t/discovery/consul-malformed-node.t`) with a mock consul server returning one malformed entry followed by two valid ones; it fails without the fix and passes with it. #### Which issue(s) this PR fixes: Fixes #12937 ### Checklist - [x] I have explained the need for this PR and the problem it solves - [x] I have explained the changes or the new features added to this PR - [x] I have added tests corresponding to this change - [ ] I have updated the documentation to reflect this change - [x] I have verified that this change is backward compatible (If not, please discuss on the [APISIX mailing list](https://github.com/apache/apisix/tree/master#community) first) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
