nic-6443 opened a new pull request #6259: URL: https://github.com/apache/apisix/pull/6259
### What this PR does / why we need it: <!--- Why is this change required? What problem does it solve? --> <!--- If it fixes an open issue, please link to the issue here. --> Discussion in mailing list: https://lists.apache.org/thread/pfkf88h7v515t29xh6csxhhfbhcbt77j > I have a problem with APISIX and I hope I can discuss it with you. > APISIX has a configuration item: `etcd.resync_delay`, the effect is to pause for a while before launching the next watch request when the method call of watch etcd returns an error. I understand that this logic is to protect the etcd server from being overloaded by uninterrupted retries by the client after an unintended exception. > I think this protection mechanism is reasonable, but one of the cases of error is timeout error, which means that no event is generated for the specified key within the time period of this watch (default 30s timeout), this kind of error is expected, because usually the configuration of the gateway does not change frequently, and at this time we do not have special handling for timeout error, so it will also cause the next watch call to be launched with a wait of `etcd.resync_delay` seconds. This is very dangerous. > For example: in the default configuration, when the user's upstream configuration does not change within 30s, apisix will suspend the synchronization configuration for about 6-7 seconds (5s+jitter), and apisix will not be able to respond to all changes to the upstream during this period. > So I think we should let the timeout error go and not take the resync delay logic. This is in line with the millisecond configuration synchronization requirements claimed in the apisix documentation. The impact of doing so: removing the resync delay after timeout error will cause apisix to have more concurrent etcd connections over time, for example, in the default configuration (`etcd.timeout=30, etcd.resync_delay=5`), the delay resync after timeout processing can reduce the number of concurrent connections by ~ 1/6(6/(6+30)). I think this impact is negligible compared to the configuration not taking effect in time. ### Pre-submission checklist: <!-- Please follow the PR manners: 1. Use Draft if the PR is not ready to be reviewed 2. Test is required for the feat/fix PR, unless you have a good reason 3. Doc is required for the feat PR 4. Use a new commit to resolve review instead of `push -f` 5. If you need to resolve merge conflicts after the PR is reviewed, please merge master but do not rebase 6. Use "request review" to notify the reviewer once you have resolved the review 7. Only reviewer can click "Resolve conversation" to mark the reviewer's review resolved --> * [x] Did you explain what problem does this PR solve? Or what new features have been added? * [x] Have you added corresponding test cases? * [ ] Have you modified the corresponding document? * [x] Is this PR backward compatible? **If it is not backward compatible, please discuss on the [mailing list](https://github.com/apache/apisix/tree/master#community) first** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
