Re: The Gump3 branch

Leo Simons Sun, 09 Jan 2005 12:30:27 -0800

On 09-01-2005 18:28, "Adam R. B. Jack" <[EMAIL PROTECTED]> wrote:
>> Ooh, long e-mail! I'm gonna try and split this up... :-D
> 
> Sorry Dude, I got excited. :-)


Excitement is good!

> I wonder if there is as much significant difference between
> Gump2 and Gump3 as I first thought.

Probably not. Then again, I don't know what you thought...

> I have a slight deja-vu feeling here. You've built a nice (clean) start,
> like Sam did, but to get from this to a live running system will take much
> the same work that I added last time, and I'm not sure the key problems of
> Gump2 have been understood/corrected. I'm going to try (over time) to list
> every place in Gump2 that I feel would be as bad in Gump3 so we can address
> them. This isn't me being petty, but me trying to pressure test this new
> approach against my understanding of reality (for all it's/my warts).

It's a good idea. By all means. Software architecture is *hard*.

> [ BTW: I still could use help with IOC. I have a crude understanding of it,
> but please don't forget to enlighten me if you see I'm missing a point.]

That takes years! :-D. Think military and about "who is in command". Maybe
you're familiar with patterns like "chain of responsibility". IOC is a
pattern in the same sense. Its a tree of commands. General at the top.

> Sure, I see that components ought not need to communicate directly. In Gump2
> we have a model tree (workspace/modules/projects) and a (theoretically
> separate, but not) tree of results. That tree is for a few projects, or all,
> based off the filter of work to do. As components do work on that tree they
> store data at the right level (run/workspace/module/project), perhaps even
> setting state (failed, etc.). This is Gump2, and (as I hear it) Gump3, no
> differences.

There's a few I think. I had the hardest time fully reading through the
gump2 model code. I decided I needed to start with the XML, retrieve the
fundamental abstractions, and rewrite the tree. It was so much fun I just
kept going.

The gump3 tree is totally passive, and much closer to the way a
mathematician would build a tree. You can let loose algorithms on it that
were figured out in the 30s (ie the topological sort in the walker code is
one of those).

The gump3 tree does not do any kind of validation. It does only the most
minimal of defaults.

The gump3 tree is more fully "normalized". All references are fully two-way,
like with DOM. The difference between <depend/> and <option/> sucks
conceptually, so now the "option"-ness is just a property of the edge that
connects two vertices.

I think its much simpler.

> I feel it is that tree that is the weakness people consider "bloat". Not
> it's memory size, but it's complexity, all the data stored in there -- and
> the fact it is a "batch". That is a key similarity between Gump2/Gump3 and
> (IMHO) a key issue to address.

Right.

> Part of the problem is ordering/sequencing. The CVS updating would not  halt
> all efforts on a module (builds would occur) 'cos the CVS failed if it had a
> "semi-fresh" copy. (This was due to SF.net CVS being so flakey for so long
> even for Gump-wise stable things like JUnit.) As such, prior to CVS updating
> we needed to bring some "stats/history" information into memory, so enforces
> an implicit dependency. [Note: Stats Actor today stores Stats on the Tree,
> so users (CVS Actor) just ask for it from there, they don't talk directly.]

That's a big part of the problem. The solution is in the back of my head,
nearly constantly as I look at gump. Basically these kinds of decisions are
all encapsulated into the graphical algebra formulae Stefano and me found in
September. It would be real nice to meet face-to-face so we could talk about
that one!

> I know you can do "inter component communications" w/ Python properties,
> Gump2 does, but it has no "contract" (as Stefano would say) it is not clean,
> it is intricate internals knowledge from one component to annother. It is
> stuff like this (and order dependencies like this) that ties components
> together, and keeps things fat. [Gump2 at least used typed member
> data/methods on the tree in order to allow some contracts.]

That's a fundamental difference right there! Strong typing is the way we
write contracts in java, but that really doesn't work as well in python. We
miss the interface keyword. Python OO needs to be built for dynamism. Take a
look at how hard the Zope people tried and failed to add that in and how
immensely hard that has hit them in the face and how bloated their design is
now!

The way to specify contracts in python is to document them.

"The CvsUpdater plugin will set a string property cvs_update_log on each
module that is of type 'cvs'. The property contains the log output from the
cvs update command of course."

That's a contract right there. Solidify the contract in a unit test for the
updater. Model stays clean, and blissfully unaware.

> What you are suggesting in almost exactly how Gump2 works, and is (I fear)
> where the thoughts to "bloat" come from.

The difference with this (scary!) dynamic approach that I suggested (which
is not as dynamic as the one Sam built, because that one was too difficult
to understand for us mere mortals) and gump2 is that the bloat is not in the
actual model code. The bloat is spread out over all those little to-be-built
plugins that have the little simple contracts like the one above.

It might be the case that we need some conventions for things like property
names (a plugin named gump.plugins.updater.CvsUpdater might prefix property
names with ctx_updater_cvs_updater), but I'd like to see a problem arise for
we start with those conventions.

> I'm a PIPE lover the much as the next guy, but simple flat stream pipes are
> not what we are building. Our components use complex results. Do we need
> contracts for those, or things (like DOM tree/XML structures) that we can
> persist/stream/validate. [How does Cocoon address this?]

You probably don't want to know :-D. Cocoon has generalized concepts of
generators, transformers, serializer, pipelines, sitemaps, etc. It has so
many concepts it hurts. Best learned by reading a book I suppose.

>> Without steps. That "|" there in gump is achieved by setting a property on
> a
>> piece of the model.
> 
> As with Gump2, but the properties grow and need management. They (and
> implicit dependencies) are the bloat.

I think they need _less_ management. And where they need management, that
management is _not_ the task of the core engine or of the model
implementation code, but the task of the plugins and their developers.
Another pattern I learned at avalon: seperation of concerns.

The implicit dependencies problem is solved through IOC. I'm confident
that'll work.

> I think *the* key problem with Gump2 is "what is core" and "what can be
> plugged in". Maybe I (and you) are getting a little carried away with what
> can be a plug-in, and maybe too many things are invalidly coded as such. Is
> "historical information" a fundamental service or some swappable component?
> [Please forgive me if I fail to know the correct terminology for 'corn
> concerns' or whatever. Perhaps teach me what I need to communicate more
> clearly with you.]

I read you loud and clear. We're building real software here, not some
pristine reusable white castle. (though that's fun too!)

The answer is deceptively simple. Everything is a component. Everything is
swappable. The only thing that is not swappable is the bit of code or
configuration that glues together the swappable bits (in our case,
config.py).

As long as every bit of code that does logging takes a "log" argument in its
__init__, as long as every class out there doesn't depend on some global
property being available, absolutely everything is swappable.

> The problem with Gump2 (and why it is a batch, and less able to be
> incremental/split) is that we have metadata loading as a stage, and not "on
> demand".

Hmm. I disagree. Just about the very first demand to satisfy is that the
project tree does not contain circular dependencies. And to be sure of that,
you really need to load all the metadata in.

If you don't do that, you might encounter a cycle halfway through, and then
it becomes difficult to recover.

> As such we blast (we hope) through the whole metadata, building a
> tree, and then work it as a batch. It is hard to allow folks to plug in
> loaders (e.g. Maven parsers),

Do you think that's important? We could make that easier, at the cost of a
more complex loader implementation.

> and harder still to allow them to build/load
> the in-memory structures themselves.

That's not good. I hope that by keeping the "Objectifier" separate, we have
a more rigid seperation between xml and in-memory structure. So you can just

  w = Workspace("test")
  r = CvsRepository("ant", ...)
  w.add_repository(r)
  m = Module("ant", ...)
  r.add_module(m)
  p = Project("bootstrap-ant", ...)
  m.add_project(p)
  c1 = Mkdir("build", ...)
  p.add_command(c1)
  c2 = Script("bootstrap", ...)
  p.add_command(c2)

(note that that's not possible now because you need to set both the parent
and the child relationships yourself, but it will be easy to add; I already
did this for the dependency/dependee relationships). Hey, that's a skeleton
unit test right there! testItIsEasyToBuildAGumpModelManually()

> This is true for "loading" for
> "modelling", and for much of our core. This is where we fail to have a
> system that we or others can break into pieces, uses in pieces. I think this
> is where we need components.

Could be. But I really doubt someone actually wants to write (for example) a
different Resolver for href resolution. We need to think real hard about
what bits it are that need multiple implementations there.

> I don't know if all things can be simple components, or if we need some
> "interfaces" (e.g. a LoaderComponent, a BuilderCompent. etc.) In Gump2 I
> tried the latter (if not formally as components) 'cos I felt it was less
> pure, more practical, and better fitting the need. I'd like to hear
> viewpoints on that, 'cos I think it is key.

I think that the core actor "interface" (which I renamed AbstractPlugin)
will do. I'd like to give it a try. I hope it results in less coupling.

Pfew. We really should start writing some unit tests. If I had the time I
would start from scratch one more time using a test-first approach, but I
haven't figured out how to comfortably do test-first python development yet.

Cheers all!

- Leo



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: The Gump3 branch

Reply via email to