rawlinp edited a comment on pull request #6017:
URL: https://github.com/apache/trafficcontrol/pull/6017#issuecomment-879391068


   > In practice, with a 1s cache, it's extremely unlikely
   
   I disagree. I think the race condition is extremely likely to occur, even 
with a 1 second cache, and _especially_ once `t3c` is polling TO more 
frequently (which is the goal).
   
   > We could handle that race by adding the cache time to the Update Status 
endpoint
   
   Different TOs could all have different `cache_ms` settings, so this doesn't 
really solve the race. They probably shouldn't have different cache settings in 
practice, but they _can_, which means we can't really depend on per-TO-instance 
settings.
   
   I appreciate the intent of making TO more scalable by adding things like 
timed caches and RWR, but I'm not really sure it's worth the risk of 
sacrificing our data consistency. It seems far more safe and scalable to 
implement something like Cache Config Snapshots, where we can cache that data 
in-memory with 100% data consistency without the possibility of race 
conditions. Rather than complicate the entire API by adding a new layer that 
every endpoint has to go through, we'd have a single endpoint with an in-memory 
cache (that isn't timed and is easily invalidated whenever a new snapshot is 
taken). It just seems easier to make a single endpoint really scalable 
(special-built for `t3c`) instead of trying to make the entire API scalable by 
introducing data consistency risk.
   
   We can also improve the endpoints that `t3c` requests which seem to make the 
biggest impact on TODB load (e.g. `deliveryserviceservers`, `jobs`, etc). I'm 
pretty sure `t3c` requests every invalidation job that has ever been submitted 
every single time it processes a revalidation, so we can definitely improve 
that (https://github.com/apache/trafficcontrol/issues/5674), and `t3c` doesn't 
need the entirety of the `deliveryserviceservers` data, although I'm not sure 
it's necessary to improve that since those are going away in favor of 
topologies. However, if the alternative is data consistency risk, it might make 
sense to at least filter that API by CDN.
   
   We also have the IMS changes you recently added to `t3c`, and I'm sure that 
will help improve performance a bit.
   
   We're also running TODB on relatively puny VMs, so I'm curious how much 
performance we'll actually get out of baremetal.
   
   All that is to say: I think there are other avenues we should pursue and 
evaluate before we start introducing data consistency risk.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to