jhg03a commented on pull request #6017: URL: https://github.com/apache/trafficcontrol/pull/6017#issuecomment-879464209
> Different TOs could all have different `cache_ms` settings, so this doesn't really solve the race. They probably shouldn't have different cache settings in practice, but they _can_, which means we can't really depend on per-TO-instance settings. That problem is out of scope because it's the job of external tooling to ensure that doesn't happen. If it's something that should be in scope, we could just leverage out existing profile/parameter system to just take it out of config files entirely and therefore always be consistent. > I appreciate the intent of making TO more scalable by adding things like timed caches and RWR, but I'm not really sure it's worth the risk of sacrificing our data consistency. It seems far more safe and scalable to implement something like Cache Config Snapshots, where we can cache that data in-memory with 100% data consistency without the possibility of race conditions. Rather than complicate the entire API by adding a new layer that every endpoint has to go through, we'd have a single endpoint with an in-memory cache (that isn't timed and is easily invalidated whenever a new snapshot is taken). It just seems easier to make a single endpoint really scalable (special-built for `t3c`) instead of trying to make the entire API scalable by introducing data consistency risk. Both approaches have merit and precedent. In this case though it feels like the risk vs reward seems to me at least very much in favor of moving ahead with it. Another important piece here is that it's done and not on a roadmap or blueprint. Whenever Cache Config Snapshots comes about, there's nothing that says this extra caching layer has to stay (although even CCS would benefit too). The other improvements that have been suggested are valid too and some are even in progress. When it comes to these TODB/API performance issues, I havn't seen any reproduce case that can be measured or evaluated reliably. This is why a multifaceted approach isn't a bad thing. It's acknowledged that cache configs are some of the heaviest hitters and at large scale depending on how your postgres instance is configured, on what, and where in the network plays a big part in the overall success rates. This is a PR which does significantly help alleviate that scalability concern without requiring any postgres or infrastructure changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
