If all the binaries are compiled and shipped together all the time, I could go 
either way.  The main gain I can see though would be in debugging and test 
scope with small binaries.  It's easier to say affirmatively that change A's 
scope of effect is limited to one testable object of 4, then it is to wonder if 
the monolith has some other dependent codepaths that have to be checked as 
well.  Having one binary to rule them all I think fosters a human habit of 
scope creep to just make it do one more thing instead of focusing on a specific 
set of jobs.  I'm a huge fan of adding more responsibility on the system 
operators to use their native toolsets to facilitate several of the jobs ORT 
has traditionally done.  That helps the project lower its overall maintenance 
obligation and provides greater flexibility so it's easier to break into new 
environment configurations.

I'm also not a fan (-1) on push instead of pull.  It trades the DDoS problem 
you mention for having to manage all the orchestration surrounding when things 
apply and what happens in a whole new set of error cases where a push message 
gets missed in the network somewhere.  Even if you think of a message bus of 
some kind makes it better, that just adds another layer of complexity and fault 
domain to the overall solution.  A fast-enough poll is also indistinguishable 
from push.  Instead, I think it's more worth looking at how to "take the mass 
out of the hammer".  We're making significant strides to reduce our most 
expensive queries now, and that's only going to get better with flexible 
cachegroups.  Http caching could get us a very long way for things like making 
ORT take a smaller resource hit or making TP more responsive.  If the database 
queries are still too much, we could look at splitting read queries off onto a 
separate connection string for multiple RO replicas.

Jonathan G

On 4/13/20, 4:46 PM, "Rawlin Peters" <[email protected]> wrote:

    I'm generally +1 on redesigning ORT with the removal of the features
    you mentioned, but the one thing that worries me is the number of
    unique binaries/executables involved (potentially 11). Communicating
    between 11 different processes via stdin/stdout and exit codes, even
    if the processes themselves are relatively simple, is fairly complex
    as a whole. IMO I don't really see a problem with implementing it as a
    single well-designed binary -- if it's Go, each proposed binary could
    just be its own package instead, with each package only exporting one
    high-level function. The main func would then be the "Aggregator" that
    simply calls each package's public function in turn, passing the
    output of one into the input of the next, checking for errors at each
    step. I think that would make it much easier to debug and test as a
    whole.

    I would also like to bring up the idea that we really need to change
    ORT's "pull" paradigm, or at least make the "pull" more efficient so
    that we don't have thousands of ORT instances all making the same
    requests to TO, with TO having to hit the DB for every request even
    though nothing has actually changed. Since we control ORT we have
    nearly 100% of control over all TO API requests made, yet we have a
    design that self-DDOSes itself by default right now. Do we want to
    tackle that problem as part of this redesign, or is that out of scope?

    - Rawlin

    On Thu, Apr 9, 2020 at 4:57 PM Robert O Butts <[email protected]> wrote:
    >
    > I've made a Blueprint proposing to rewrite ORT:
    > 
https://urldefense.com/v3/__https://github.com/apache/trafficcontrol/pull/4628__;!!CQl3mcHX2A!WP8MIrdRGn9EvXJUOSFoKai78dFn2hTY6cWc-BQ29yg69KNi_bYeuPFZaKxRSgsU2s3r$
    >
    > If you have opinions on ORT, please read and provide feedback.
    >
    > In a nutshell, it's proposing to rewrite ORT in Go, in the "UNIX
    > Philosophy" of small, "do one thing" apps.
    >
    > Importantly, the proposal **removes** the following ORT features:
    >
    > chkconfig - CentOS 7+ and SystemD don't use chkconfig, and moreover our
    > default Profile runlevel is wrong and broken. But my knowledge of
    > CentOS,SystemD,chkconfig,runlevels isn't perfect, if I'm mistaken about
    > this and you're using ORT to set chkconfig, please let us know ASAP.
    >
    > ntpd - ORT today has code to set ntpd config and restart the ntpd service.
    > I have no idea why it was ever in charge of this, but this clearly seems 
to
    > be the system's job, not ORT or TC's.
    >
    > interactive mode - I asked around, and couldn't find anyone using this.
    > Does anyone use it? And feel it's essential to keep in ORT? And also feel
    > that the way this proposal breaks up the app so that it's easy to request
    > and compare files before applying them isn't sufficient?
    >
    > reval mode - This was put in because ORT was slow. ORT in master now takes
    > 10-20s on our large CDN. Moreover, "reval" mode is no longer significantly
    > faster than just applying everything. Does anyone feel otherwise?
    >
    > report mode - The functionality here is valuable. But intention here is to
    > replace "ORT report mode" with a pipelined set of app calls or a script to
    > do the same thing. I.e. because it's "UNIX-Style" you can just "ort-to-get
    > | ort-make-configs | ort-diff".
    >
    > package installation - This is the biggest feature the proposal removes,
    > and probably the most controversial. The thought is: this isn't something
    > ORT or Traffic Control should be doing. The same thing that manages the
    > physical machine and/or operating system -- whether that's Ansible, 
Puppet,
    > Chef, or a human System Administrator -- should be installing the OS
    > packages for ATS and its plugins, just like it manages all the other
    > packages on your system. ORT and TC should deploy configuration, not
    > install things.
    >
    > So yeah, feedback welcome. Feel free to post it on the list here or the
    > blueprint PR on github.
    >
    > Thanks,


Reply via email to