kachidoki opened a new issue, #2342:
URL: https://github.com/apache/apisix-ingress-controller/issues/2342
### Issue Faced
We are experiencing significant delays when deleting Routes in bulk. The
problem becomes evident when the number of Routes in the cluster exceeds 1w+.
During this process, we observe noticeable latency and a clear spike in the
etcd monitoring.
Upon reviewing the code, I found the following segment which seems to be
causing the issue:
```
// TODO: Maintain a reference count for each object without having to poll
each time
func (u *upstreamClient) deleteCheck(ctx context.Context, obj *v1.Upstream)
(bool, error) {
routes, _ := u.cluster.route.List(ctx)
sroutes, _ := u.cluster.cache.ListStreamRoutes()
if routes == nil && sroutes == nil {
return true, nil
}
for _, route := range routes {
if route.UpstreamId == obj.ID {
return false, fmt.Errorf("can not delete this upstream,
route.id=%s is still using it now", route.ID)
}
}
for _, sroute := range sroutes {
if sroute.UpstreamId == obj.ID {
return false, fmt.Errorf("can not delete this upstream,
stream_route.id=%s is still using it now", sroute.ID)
}
}
return true, nil
}
```
The line
`routes, _ := u.cluster.route.List(ctx)`
causes the code to iterate through all routes in the cluster during every
deletion. This results in unnecessary overhead.
<img width="585" alt="image"
src="https://github.com/user-attachments/assets/cbfb28d1-b68d-4802-8791-503e43edb5ee"
/>
Additionally, the latency spikes observed in etcd monitoring are caused by
the etcd range calls made during the deletion of Routes.
I would like to understand why the code is fetching the routes from the
cluster every time rather than from the cache. Was this a design decision or is
it an oversight?
### Logs
_No response_
### Steps to Reproduce
1. install apisix and apisix ingress controller
2. route crd volume > 1w+
3. delete route crd
### Environment
- APISIX Ingress Controller Version:
We are using a self-developed version of the APISIX Ingress Controller based
on an older release, which differs from the latest official versions. However,
the code for deleting Routes, which is causing the performance issue, remains
consistent with the official versions.
- Kubernetes Cluster Version: v1.24.4.
- OS Version: CentOS 7.6 x86
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]