Re: ORT Rewrite Proposal

Dave Neuman Wed, 15 Apr 2020 14:58:08 -0700

Rawlin never even mentioned the word "Push" :) he was also referring to the
potential of thousands of clients request many of the same end points all
at once.  Our CDN is good at that, Traffic Ops is not.  Anyway, I think
that problem can and should be solved outside the scope of the ORT re-write
(for now).


I agree that having smaller components that do specific things is generally
a good thing, I also think there is some diminishing return and having too
many causes more problems then it is worth.  I also think the components
should be packaged together so that we trying to manage 11 (or whatever)
different RPMS.

There are some proposed executables that I think we can consolidate:
- Config Generator and Config File Preprocessor should be 1 thing that
takes in TO data and spits out config files.
- Server config readiness and ATS plugin readiness can just be a "system
readiness verifier"
- The restart determiner and service reloader can probably be one thing
that takes flags, maybe a  "report" mode.


Thanks,
Dave


On Mon, Apr 13, 2020 at 8:32 PM Chris Lemmons <[email protected]> wrote:

> > Communicating between 11 different processes via stdin/stdout and exit
> codes, even if the processes themselves are relatively simple, is fairly
> complex as a whole.
>
> Yes and no. Using multiple processes doesn't actually reduce the
> complexity of the task as a whole. What it does do is make it crystal
> clear to a maintainer or operator what information a given process has
> and what it produces.
>
> Sure, there are other ways to do pure functional components, but
> everybody knows those components never actually wind up purely
> functional. It's very often impractical. Separate process spaces help
> slice the problem into manageable pieces so that it's easy to
> determine how any given component operates.
>
> Additionally, we've had to rewrite and replace this beast once and
> we're basically having to do it all at once. If we break it into
> smaller components, we allow components to be tested in isolation, we
> free ourselves to replace only specific subsets in the future, and by
> using standard streams for communication, we create a baseline we can
> leverage immediately as a testing hook.
>
> The major disadvantage of putting these tasks in separate process
> space is if there are large chunks of data multiple components need to
> share. And that's one of the primary things I reviewed the blueprint
> for. Large data structures aren't being repeatedly serialized and
> deserialized. Most of the interfaces are fairly small.
>
> > I would also like to bring up the idea that we really need to change
> ORT's "pull" paradigm, or at least make the "pull" more efficient so that
> we don't have thousands of ORT instances all making the same requests to TO
>
> Changing the push/pull of the model doesn't help with efficiency. For
> that, you need an indication of what config diffs need to be applied.
> And that particular question isn't any easier to answer in either
> model, it's the same inputs and outputs.
>
> I'm definitely +1 on making this a deployment decision, though. With
> the model proposed in the Blueprint, it's merely a question of when
> and how the aggregator is invoked. The ability to invoke from cron,
> pdsh, ansible, puppet, a shell, or via some future tool we add to the
> TO UI would be very helpful. Besides, a frequent pull model (aided by
> the aforementioned differential data transfer we have to develop in
> any case) is usually as fast as push and much more reliable. I fully
> expect reasonably fast polls against TC at some point when it can
> return 304s on 99.9% of requests.
>
> Push models also have different security implications that have to be
> handled. It's doable, especially if TO supports two-way authenticated
> TLS (it currently doesn't). But pull uses a security model people
> already have their heads around. If I were deploying a push system,
> I'd have to think very carefully about how all the various systems
> were authenticated.
>
> And by codifying the roles with clear inputs and outputs, we give CDN
> operators the flexibility to modify those streams if necessary.
>
> Some arbitrary ideas of varying value spring to mind: An operator may
> have an in-house secrets management system that could serve as input
> for the systems that require secrets. Or have a sanitizer that turns
> production data into test data via specialized cleaning routines;
> which could be invoked between the TO data and the config generator.
> Or a testing system that mocks inputs and compares outputs to a
> standard library of answers to validate regressions quickly and
> efficiently. Or any of a variety of things that aren't mentioned here.
> Clean inputs and outputs make those specific variations much easier to
> manage, test, and use. And as our input and output formats change,
> it'll be much more feasible for operators to manage those systems.
>
> Lastly, Gmail is telling me that other people have replied since I
> started this message. Hopefully, my points are still relevant. :)
>
> On Mon, Apr 13, 2020 at 7:06 PM ocket 8888 <[email protected]> wrote:
> >
> > For what it's worth, I'd be +1 on re-examining "push" vs "pull" for ORT.
> >
> > On Mon, Apr 13, 2020, 16:46 Rawlin Peters <[email protected]> wrote:
> >
> > > I'm generally +1 on redesigning ORT with the removal of the features
> > > you mentioned, but the one thing that worries me is the number of
> > > unique binaries/executables involved (potentially 11). Communicating
> > > between 11 different processes via stdin/stdout and exit codes, even
> > > if the processes themselves are relatively simple, is fairly complex
> > > as a whole. IMO I don't really see a problem with implementing it as a
> > > single well-designed binary -- if it's Go, each proposed binary could
> > > just be its own package instead, with each package only exporting one
> > > high-level function. The main func would then be the "Aggregator" that
> > > simply calls each package's public function in turn, passing the
> > > output of one into the input of the next, checking for errors at each
> > > step. I think that would make it much easier to debug and test as a
> > > whole.
> > >
> > > I would also like to bring up the idea that we really need to change
> > > ORT's "pull" paradigm, or at least make the "pull" more efficient so
> > > that we don't have thousands of ORT instances all making the same
> > > requests to TO, with TO having to hit the DB for every request even
> > > though nothing has actually changed. Since we control ORT we have
> > > nearly 100% of control over all TO API requests made, yet we have a
> > > design that self-DDOSes itself by default right now. Do we want to
> > > tackle that problem as part of this redesign, or is that out of scope?
> > >
> > > - Rawlin
> > >
> > > On Thu, Apr 9, 2020 at 4:57 PM Robert O Butts <[email protected]> wrote:
> > > >
> > > > I've made a Blueprint proposing to rewrite ORT:
> > > > https://github.com/apache/trafficcontrol/pull/4628
> > > >
> > > > If you have opinions on ORT, please read and provide feedback.
> > > >
> > > > In a nutshell, it's proposing to rewrite ORT in Go, in the "UNIX
> > > > Philosophy" of small, "do one thing" apps.
> > > >
> > > > Importantly, the proposal **removes** the following ORT features:
> > > >
> > > > chkconfig - CentOS 7+ and SystemD don't use chkconfig, and moreover
> our
> > > > default Profile runlevel is wrong and broken. But my knowledge of
> > > > CentOS,SystemD,chkconfig,runlevels isn't perfect, if I'm mistaken
> about
> > > > this and you're using ORT to set chkconfig, please let us know ASAP.
> > > >
> > > > ntpd - ORT today has code to set ntpd config and restart the ntpd
> > > service.
> > > > I have no idea why it was ever in charge of this, but this clearly
> seems
> > > to
> > > > be the system's job, not ORT or TC's.
> > > >
> > > > interactive mode - I asked around, and couldn't find anyone using
> this.
> > > > Does anyone use it? And feel it's essential to keep in ORT? And also
> feel
> > > > that the way this proposal breaks up the app so that it's easy to
> request
> > > > and compare files before applying them isn't sufficient?
> > > >
> > > > reval mode - This was put in because ORT was slow. ORT in master now
> > > takes
> > > > 10-20s on our large CDN. Moreover, "reval" mode is no longer
> > > significantly
> > > > faster than just applying everything. Does anyone feel otherwise?
> > > >
> > > > report mode - The functionality here is valuable. But intention here
> is
> > > to
> > > > replace "ORT report mode" with a pipelined set of app calls or a
> script
> > > to
> > > > do the same thing. I.e. because it's "UNIX-Style" you can just
> > > "ort-to-get
> > > > | ort-make-configs | ort-diff".
> > > >
> > > > package installation - This is the biggest feature the proposal
> removes,
> > > > and probably the most controversial. The thought is: this isn't
> something
> > > > ORT or Traffic Control should be doing. The same thing that manages
> the
> > > > physical machine and/or operating system -- whether that's Ansible,
> > > Puppet,
> > > > Chef, or a human System Administrator -- should be installing the OS
> > > > packages for ATS and its plugins, just like it manages all the other
> > > > packages on your system. ORT and TC should deploy configuration, not
> > > > install things.
> > > >
> > > > So yeah, feedback welcome. Feel free to post it on the list here or
> the
> > > > blueprint PR on github.
> > > >
> > > > Thanks,
> > >
>

Re: ORT Rewrite Proposal

Reply via email to