[I] bug: APISIX Ingress Controller v2.0.0 - Service Inline Upstreams Not Updated on Endpoint Changes [apisix-ingress-controller]

via GitHub Mon, 29 Dec 2025 06:16:22 -0800


jasaulakh1988 opened a new issue, #2689:
URL: https://github.com/apache/apisix-ingress-controller/issues/2689


   ### Current Behavior
   
   # Bug Report: APISIX Ingress Controller v2.0.0 - Service Inline Upstreams 
Not Updated on Endpoint Changes
   
   ## Summary
   
   APISIX Ingress Controller v2.0.0 (with ADC sidecar) creates Services with 
inline upstreams, but does NOT update these inline upstreams when Kubernetes 
Endpoints change (e.g., pod restarts, rescheduling). This causes traffic to be 
routed to stale/non-existent pod IPs, resulting in 504 Gateway Timeout errors.
   
   ## Environment
   
   - **APISIX Gateway Version:** 3.11.0
   - **APISIX Ingress Controller Version:** 2.0.0 (stable release)
   - **Helm Chart Version:** 1.1.0 (official Bitnami chart)
   - **Kubernetes Version:** OVH Managed Kubernetes
   - **etcd:** 3-node cluster
   
   ## Controller Configuration
   
   ```yaml
   provider:
     type: apisix
     syncPeriod: 1m
     initSyncDelay: 30s
   ``
   
   
   ## Workaround
   
   Manual update of etcd Service entries:
   ```bash
   kubectl exec -n apisix-system apisix-etcd-0 -- \
     etcdctl put /apisix/services/<service-id> '<updated-json-with-correct-ip>'
   ```
   
   Then send HUP signal to APISIX pods to reload config:
   ```bash
   kubectl exec -n apisix-system <apisix-pod> -- kill -HUP 1
   ```
   
   ## Impact
   
   - **Severity:** Critical for production use
   - **Impact:** Complete service outage for affected routes when pods restart
   - **Affected:** Any route using the ADC sync pattern with inline service 
upstreams
   
   ## Additional Observations
   
   1. **Hot reload not working:** Even though APISIX supports hot reload from 
etcd, the Service updates are never pushed to etcd in the first place.
   
   2. **Pattern difference:** Routes created with the newer controller version 
use `upstream_id` references instead of inline upstreams. These DO get updated 
correctly. The issue affects routes/services created before this pattern change.
   
   3. **No errors in controller logs:** The controller doesn't log any errors 
about failing to update services. The sync appears to complete successfully but 
simply doesn't update inline upstreams.
   
   ## Requested Action
   
   1. Investigate why Service inline upstreams are not updated on endpoint 
changes
   2. Either:
      - Fix the ADC sync to update inline upstreams in Services, OR
      - Change the sync pattern to always use `upstream_id` references instead 
of inline upstreams
   3. Document this limitation if it's expected behavior
   
   ## Related Information
   
   - Controller logs show sync completing with correct service count
   - `syncPeriod: 1m` is being respected (syncs every minute)
   - Separate Upstream objects are updated on each sync cycle
   - Service objects are NOT updated after initial creation
   
   ## Contact
   
   Happy to provide additional logs, configurations, or test scenarios to help 
debug this issue.
   
   ### Expected Behavior
   
   ### Expected Behavior
   
   When a Kubernetes pod restarts and gets a new IP address, the APISIX Ingress 
Controller should update the upstream nodes in APISIX to reflect the new pod 
IP. Traffic should continue flowing to the new pod IP without interruption.
   
   ### Actual Behavior
   
   1. Controller creates **Services with inline upstreams** containing pod IPs
   2. Controller also creates **separate Upstream objects** with the same pod 
IPs
   3. When pods restart and get new IPs:
      - The **separate Upstream objects ARE updated** with new IPs ✅
      - The **inline upstreams inside Services are NOT updated** ❌
   4. Routes reference Services (via `service_id`), not the separate Upstream 
objects
   5. Traffic continues to be routed to **stale pod IPs** that no longer exist
   6. Results in **504 Gateway Timeout** errors
   
   ### Error Logs
   
   ### Evidence
   
   **Service in etcd (NOT updated - shows stale IP):**
   ```json
   {
     "id": "f08f5c87",
     "name": "default_beta-websocket-routes_0",
     "update_time": 1766750592,  // December 26 - 3 days old!
     "upstream": {
       "type": "roundrobin",
       "nodes": [
         {
           "host": "10.2.4.10",   // OLD IP - pod no longer exists!
           "port": 4000,
           "weight": 100
         }
       ]
     }
   }
   ```
   
   **Upstream object in etcd (updated correctly):**
   ```json
   {
     "id": "f08f5c87",
     "name": "default_beta-websocket-routes_0",
     "update_time": 1767016352,  // Today - recently updated
     "nodes": [
       {
         "host": "10.2.16.3",    // CORRECT new IP
         "port": 4000,
         "weight": 100
       }
     ]
   }
   ```
   
   **Route configuration:**
   ```json
   {
     "name": "default_beta-websocket-routes_beta-game-core-api",
     "service_id": "f08f5c87",   // References Service, not Upstream
     "upstream_id": null         // Not using separate upstream
   }
   ```
   
   **APISIX error logs:**
   ```
   upstream timed out (110: Connection timed out) while connecting to upstream,
   upstream: "http://10.2.4.10:4000/...";,  // Stale IP!
   ```
   
   **Kubernetes endpoint (actual pod IP):**
   ```
   NAME                       ENDPOINTS        AGE
   game-core-web   10.2.16.3:4000   25d
   ```
   
   ## Root Cause Analysis
   
   The ADC (APISIX Declarative Configuration) sync mechanism appears to:
   
   1. Watch for EndpointSlice changes in Kubernetes
   2. Update the **separate Upstream objects** when endpoints change
   3. **NOT update the inline upstream configuration inside Service objects**
   
   Since Routes reference Services (which have inline upstreams), the stale IPs 
persist even though the separate Upstream objects have correct IPs.
   
   
   ### Steps to Reproduce
   
   ## Reproduction Steps
   
   1. Deploy APISIX Ingress Controller v2.0.0 with ADC sidecar
   2. Create an ApisixRoute resource pointing to a Kubernetes Service
   3. Wait for controller to sync (creates Service with inline upstream in 
APISIX)
   4. Note the pod IP in the APISIX Service's inline upstream
   5. Delete the pod (e.g., `kubectl delete pod <pod-name>`)
   6. Wait for new pod to start with a new IP
   7. Check the APISIX Service - inline upstream still has OLD IP
   8. Check the APISIX Upstream object - has correct NEW IP
   9. Traffic fails with 504 timeout to the old IP
   
   ### Environment
   
   - APISIX Ingress controller version (run `apisix-ingress-controller version 
--long`)
   - Kubernetes cluster version (run `kubectl version`)
   - OS version if running APISIX Ingress controller in a bare-metal 
environment (run `uname -a`)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] bug: APISIX Ingress Controller v2.0.0 - Service Inline Upstreams Not Updated on Endpoint Changes [apisix-ingress-controller]

Reply via email to