I'm generally +1 on redesigning ORT with the removal of the features you mentioned, but the one thing that worries me is the number of unique binaries/executables involved (potentially 11). Communicating between 11 different processes via stdin/stdout and exit codes, even if the processes themselves are relatively simple, is fairly complex as a whole. IMO I don't really see a problem with implementing it as a single well-designed binary -- if it's Go, each proposed binary could just be its own package instead, with each package only exporting one high-level function. The main func would then be the "Aggregator" that simply calls each package's public function in turn, passing the output of one into the input of the next, checking for errors at each step. I think that would make it much easier to debug and test as a whole.
I would also like to bring up the idea that we really need to change ORT's "pull" paradigm, or at least make the "pull" more efficient so that we don't have thousands of ORT instances all making the same requests to TO, with TO having to hit the DB for every request even though nothing has actually changed. Since we control ORT we have nearly 100% of control over all TO API requests made, yet we have a design that self-DDOSes itself by default right now. Do we want to tackle that problem as part of this redesign, or is that out of scope? - Rawlin On Thu, Apr 9, 2020 at 4:57 PM Robert O Butts <[email protected]> wrote: > > I've made a Blueprint proposing to rewrite ORT: > https://github.com/apache/trafficcontrol/pull/4628 > > If you have opinions on ORT, please read and provide feedback. > > In a nutshell, it's proposing to rewrite ORT in Go, in the "UNIX > Philosophy" of small, "do one thing" apps. > > Importantly, the proposal **removes** the following ORT features: > > chkconfig - CentOS 7+ and SystemD don't use chkconfig, and moreover our > default Profile runlevel is wrong and broken. But my knowledge of > CentOS,SystemD,chkconfig,runlevels isn't perfect, if I'm mistaken about > this and you're using ORT to set chkconfig, please let us know ASAP. > > ntpd - ORT today has code to set ntpd config and restart the ntpd service. > I have no idea why it was ever in charge of this, but this clearly seems to > be the system's job, not ORT or TC's. > > interactive mode - I asked around, and couldn't find anyone using this. > Does anyone use it? And feel it's essential to keep in ORT? And also feel > that the way this proposal breaks up the app so that it's easy to request > and compare files before applying them isn't sufficient? > > reval mode - This was put in because ORT was slow. ORT in master now takes > 10-20s on our large CDN. Moreover, "reval" mode is no longer significantly > faster than just applying everything. Does anyone feel otherwise? > > report mode - The functionality here is valuable. But intention here is to > replace "ORT report mode" with a pipelined set of app calls or a script to > do the same thing. I.e. because it's "UNIX-Style" you can just "ort-to-get > | ort-make-configs | ort-diff". > > package installation - This is the biggest feature the proposal removes, > and probably the most controversial. The thought is: this isn't something > ORT or Traffic Control should be doing. The same thing that manages the > physical machine and/or operating system -- whether that's Ansible, Puppet, > Chef, or a human System Administrator -- should be installing the OS > packages for ATS and its plugins, just like it manages all the other > packages on your system. ORT and TC should deploy configuration, not > install things. > > So yeah, feedback welcome. Feel free to post it on the list here or the > blueprint PR on github. > > Thanks,
