rawlinp edited a comment on pull request #6017: URL: https://github.com/apache/trafficcontrol/pull/6017#issuecomment-879391068
> In practice, with a 1s cache, it's extremely unlikely I disagree. I think the race condition is extremely likely to occur, even with a 1 second cache, and _especially_ once `t3c` is polling TO more frequently (which is the goal). > We could handle that race by adding the cache time to the Update Status endpoint Different TOs could all have different `cache_ms` settings, so this doesn't really solve the race. They probably shouldn't have different cache settings in practice, but they _can_, which means we can't really depend on per-TO-instance settings. I appreciate the intent of making TO more scalable by adding things like timed caches and RWR, but I'm not really sure it's worth the risk of sacrificing our data consistency. It seems far more safe and scalable to implement something like Cache Config Snapshots, where we can cache that data in-memory with 100% data consistency without the possibility of race conditions. Rather than complicate the entire API by adding a new layer that every endpoint has to go through, we'd have a single endpoint with an in-memory cache (that isn't timed and is easily invalidated whenever a new snapshot is taken). It just seems easier to make a single endpoint really scalable (special-built for `t3c`) instead of trying to make the entire API scalable by introducing data consistency risk. We can also improve the endpoints that `t3c` requests which seem to make the biggest impact on TODB load (e.g. `deliveryserviceservers`, `jobs`, etc). I'm pretty sure `t3c` requests every invalidation job that has ever been submitted every single time it processes a revalidation, so we can definitely improve that (https://github.com/apache/trafficcontrol/issues/5674), and `t3c` doesn't need the entirety of the `deliveryserviceservers` data, although I'm not sure it's necessary to improve that since those are going away in favor of topologies. However, if the alternative is data consistency risk, it might make sense to at least filter that API by CDN. We also have the IMS changes you recently added to `t3c`, and I'm sure that will help improve performance a bit. We're also running TODB on relatively puny VMs, so I'm curious how much performance we'll actually get out of baremetal. All that is to say: I think there are other avenues we should pursue and evaluate before we start introducing data consistency risk. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
