> Communicating between 11 different processes via stdin/stdout and exit codes, 
> even if the processes themselves are relatively simple, is fairly complex as 
> a whole.

Yes and no. Using multiple processes doesn't actually reduce the
complexity of the task as a whole. What it does do is make it crystal
clear to a maintainer or operator what information a given process has
and what it produces.

Sure, there are other ways to do pure functional components, but
everybody knows those components never actually wind up purely
functional. It's very often impractical. Separate process spaces help
slice the problem into manageable pieces so that it's easy to
determine how any given component operates.

Additionally, we've had to rewrite and replace this beast once and
we're basically having to do it all at once. If we break it into
smaller components, we allow components to be tested in isolation, we
free ourselves to replace only specific subsets in the future, and by
using standard streams for communication, we create a baseline we can
leverage immediately as a testing hook.

The major disadvantage of putting these tasks in separate process
space is if there are large chunks of data multiple components need to
share. And that's one of the primary things I reviewed the blueprint
for. Large data structures aren't being repeatedly serialized and
deserialized. Most of the interfaces are fairly small.

> I would also like to bring up the idea that we really need to change ORT's 
> "pull" paradigm, or at least make the "pull" more efficient so that we don't 
> have thousands of ORT instances all making the same requests to TO

Changing the push/pull of the model doesn't help with efficiency. For
that, you need an indication of what config diffs need to be applied.
And that particular question isn't any easier to answer in either
model, it's the same inputs and outputs.

I'm definitely +1 on making this a deployment decision, though. With
the model proposed in the Blueprint, it's merely a question of when
and how the aggregator is invoked. The ability to invoke from cron,
pdsh, ansible, puppet, a shell, or via some future tool we add to the
TO UI would be very helpful. Besides, a frequent pull model (aided by
the aforementioned differential data transfer we have to develop in
any case) is usually as fast as push and much more reliable. I fully
expect reasonably fast polls against TC at some point when it can
return 304s on 99.9% of requests.

Push models also have different security implications that have to be
handled. It's doable, especially if TO supports two-way authenticated
TLS (it currently doesn't). But pull uses a security model people
already have their heads around. If I were deploying a push system,
I'd have to think very carefully about how all the various systems
were authenticated.

And by codifying the roles with clear inputs and outputs, we give CDN
operators the flexibility to modify those streams if necessary.

Some arbitrary ideas of varying value spring to mind: An operator may
have an in-house secrets management system that could serve as input
for the systems that require secrets. Or have a sanitizer that turns
production data into test data via specialized cleaning routines;
which could be invoked between the TO data and the config generator.
Or a testing system that mocks inputs and compares outputs to a
standard library of answers to validate regressions quickly and
efficiently. Or any of a variety of things that aren't mentioned here.
Clean inputs and outputs make those specific variations much easier to
manage, test, and use. And as our input and output formats change,
it'll be much more feasible for operators to manage those systems.

Lastly, Gmail is telling me that other people have replied since I
started this message. Hopefully, my points are still relevant. :)

On Mon, Apr 13, 2020 at 7:06 PM ocket 8888 <[email protected]> wrote:
>
> For what it's worth, I'd be +1 on re-examining "push" vs "pull" for ORT.
>
> On Mon, Apr 13, 2020, 16:46 Rawlin Peters <[email protected]> wrote:
>
> > I'm generally +1 on redesigning ORT with the removal of the features
> > you mentioned, but the one thing that worries me is the number of
> > unique binaries/executables involved (potentially 11). Communicating
> > between 11 different processes via stdin/stdout and exit codes, even
> > if the processes themselves are relatively simple, is fairly complex
> > as a whole. IMO I don't really see a problem with implementing it as a
> > single well-designed binary -- if it's Go, each proposed binary could
> > just be its own package instead, with each package only exporting one
> > high-level function. The main func would then be the "Aggregator" that
> > simply calls each package's public function in turn, passing the
> > output of one into the input of the next, checking for errors at each
> > step. I think that would make it much easier to debug and test as a
> > whole.
> >
> > I would also like to bring up the idea that we really need to change
> > ORT's "pull" paradigm, or at least make the "pull" more efficient so
> > that we don't have thousands of ORT instances all making the same
> > requests to TO, with TO having to hit the DB for every request even
> > though nothing has actually changed. Since we control ORT we have
> > nearly 100% of control over all TO API requests made, yet we have a
> > design that self-DDOSes itself by default right now. Do we want to
> > tackle that problem as part of this redesign, or is that out of scope?
> >
> > - Rawlin
> >
> > On Thu, Apr 9, 2020 at 4:57 PM Robert O Butts <[email protected]> wrote:
> > >
> > > I've made a Blueprint proposing to rewrite ORT:
> > > https://github.com/apache/trafficcontrol/pull/4628
> > >
> > > If you have opinions on ORT, please read and provide feedback.
> > >
> > > In a nutshell, it's proposing to rewrite ORT in Go, in the "UNIX
> > > Philosophy" of small, "do one thing" apps.
> > >
> > > Importantly, the proposal **removes** the following ORT features:
> > >
> > > chkconfig - CentOS 7+ and SystemD don't use chkconfig, and moreover our
> > > default Profile runlevel is wrong and broken. But my knowledge of
> > > CentOS,SystemD,chkconfig,runlevels isn't perfect, if I'm mistaken about
> > > this and you're using ORT to set chkconfig, please let us know ASAP.
> > >
> > > ntpd - ORT today has code to set ntpd config and restart the ntpd
> > service.
> > > I have no idea why it was ever in charge of this, but this clearly seems
> > to
> > > be the system's job, not ORT or TC's.
> > >
> > > interactive mode - I asked around, and couldn't find anyone using this.
> > > Does anyone use it? And feel it's essential to keep in ORT? And also feel
> > > that the way this proposal breaks up the app so that it's easy to request
> > > and compare files before applying them isn't sufficient?
> > >
> > > reval mode - This was put in because ORT was slow. ORT in master now
> > takes
> > > 10-20s on our large CDN. Moreover, "reval" mode is no longer
> > significantly
> > > faster than just applying everything. Does anyone feel otherwise?
> > >
> > > report mode - The functionality here is valuable. But intention here is
> > to
> > > replace "ORT report mode" with a pipelined set of app calls or a script
> > to
> > > do the same thing. I.e. because it's "UNIX-Style" you can just
> > "ort-to-get
> > > | ort-make-configs | ort-diff".
> > >
> > > package installation - This is the biggest feature the proposal removes,
> > > and probably the most controversial. The thought is: this isn't something
> > > ORT or Traffic Control should be doing. The same thing that manages the
> > > physical machine and/or operating system -- whether that's Ansible,
> > Puppet,
> > > Chef, or a human System Administrator -- should be installing the OS
> > > packages for ATS and its plugins, just like it manages all the other
> > > packages on your system. ORT and TC should deploy configuration, not
> > > install things.
> > >
> > > So yeah, feedback welcome. Feel free to post it on the list here or the
> > > blueprint PR on github.
> > >
> > > Thanks,
> >

Reply via email to