Jason van Zyl wrote:
On 4 Jul 06, at 1:45 PM 4 Jul 06, Steve Loughran wrote:
In a way, many of the stuff in M2 is experimental; a build tool that
effectively encodes beliefs about how a project should be structured
and delivered, focusing on component-based development instead of
application dev. I also think its time to look at how well some of the
experiment is working.
You make it sound like we're some sort of cult :-)
I think you are exploring cutting edge loosely coupled software
development processes. It's research. Interesting, fun research, but
research nonetheless. Just as Gump is an experiment in whether a unified
nightly build changes people's working processes.
I've been hanging round with semantic-web people recently, and have
devolved into using the word "belief" where they use "fact", because of
differences of opinion on what they and I think RDF triples are (they
think they're facts in a graph, I think every triple is a belief
published by an entity at a particular moment in time). The nice thing
about a belief-centric model is you get to accept the fact that
different entities have different beliefs, and a single entity/agent can
change its belief set, without ever having to worry about the fact that
the global belief-set is inconsistent.
in real agent-oriented-runtimes (still very much academic research, even
more than RDF engines), the resolver takes in to account the metadata
about which agent issued a belief statement and when during its
resolution process. Newer statements by the same entity can override
older ones; differences between entities are allowable but result in
ambiguities that may need to be dealt with further down the line.
When you apply the same agent-oriented view to POM metadata, you can say
"a POM file represents the pom author's beliefs about the artifact's
dependencies at the time they wrote the POM". It may be the beliefs
match what the artifact really needs, it may be those beliefs turn out
to be utterly wrong.
[interlude. I just grabbed the chair of the W3C RDF working group by the
coffee machine. Apparently "a belief is a state of mind", "a fact is
something that is believed". So all facts are beliefs, the only variable
being the number of believers]
Because the ibiblio repository contains fact/belief metadata from so
many sources, its that much harder to reconcile than those from single
entities. The good news is that we do have a very nice way to test these
assertions in java; running the program and seeing what classes get
loaded. So when someone is utterly wrong in their dependencies, its
pretty obvious. Its when they are slightly wrong, when they use some
classes in certain cases, often using reflection to bind at run time,
that you can get caught out.
The phrase "encoding beliefs" is an inaccurate description. It's is
simply the pursuit of best practices for software development and those
practices are very much mutable, this thread being very good evidence of
that. We also not only focused on component-oriented development, we
ourselves develop applications ourselves and we're trying to make that
coherent as well.
Ok. how about "encoding the team's ideas and experience in how to build
applications as sets of components, using
shared repositories to exchange components and their metadata"?
Personally, I always experience a bit of fear when adding a new
dependency to a project. the repository stuff, and estimate a couple
of hours to get every addition stable, primarily by building up a good
exclusion list.
This is the place to talk about that as people shouldn't be fearful
adding dependencies. But people who have an ideal setup here they
completely control the repository they use internally don't have many of
the problems that people are experiencing in this thread. Having a
public repository of high quality is not a trivial task.
Is it worse than before? Better? Or just, well, different? and if
things are either worse or not as good as they could be, what can be
changed?
The process is absolutely better. The process couple with the public
infrastructure we have now is problematic. Two very different things.
One underlying cause seems to be pom quality. Open source software dev
is a vast collection of loosely coupled projects, and what we get in
the repository in terms of metadata matches this model. Each project
produces artifacts that match their immediate needs, with POM files
that appear to work at the time of publishing. Maven then caches those
and freezes that metadata forever, even if it turns out that the
metadata was wrong. There's far better coherence within Gump, where
the metadata is effectively maintained more by the gump team
themselves than by the individual projects.
There is absolutely no way this is scalable over time. You are saying
that a small group of people can maintain metadata for projects that
they are not intimately involved with? That's like saying that people
who live outside your community have a better chance at describing your
community. I really just don't think that's possible. How many problems
has Gump had over the years trying to maintain the metadata? Huge
problems, almost never in sync with projects. You basically find out
when it breaks and go back track most of the time. There is no doubt
that the same process will happen with Maven where users of Maven will
eventually make their metadata better but that will take time. Gump has
been around for 5-6 years now. People are really only starting to use
Maven 2.x which is closing in on being out for a year. I am will to bet
in another year a great number of the problems seen in this thread will
be gone. I would argue that Gump will not work precisely because it is
not the projects themselves maintaining the metadata. Projects using
Maven will eventually get it right because it provides some value to
them to get it right.
Oh I agree, handwritten custom-coded stuff doesn't scale. That is the
price with that model, and it makes
it hard to use the same tools within your own build process. But it does
support the low-hanging-fruit of things that depend on commons-logging
yet who don't want logkit on their classpath.
Gump's problem is not just that the metadata is written by the gumpers,
and not the projects, but that the projects don't always care if the
build is broken. Getting someone to care about what happens to their
stuff downstream is the first step to fixing the problem. As more m2
takeup occurs, you should get a lot of that feedback in the system,
moving from the "please redist on the maven repository", to "please have
good metadata", before finally, the joy of silence, as everything works.
Question is, what to do about it? And if the m2 repository was an
attempt to leave the problems of the M1 repository behind, has it worked?
To a large extent I would say we have fixed many of the problems on a
technical level. Correctly the metadata and educating projects as to how
best maintain is it is a social problem and a matter of education.
Couple that with some automated integrity checks that will be performed
by the repository manager.
Yes, I think more rigorousness on accepting poms would be good. People,
even apache projects, should not be able to submit an artifact to the
repository without
-everything you depend on being there. No unresolvable artifacts.
-no dependencies on -SNAPSHOT. I know, apache projects arent meant to
release in that mode, but Apache Muse managed it, with very bad
consequences downstream.
-a (manual) review of your dependencies. You, the submitter would get
told your dependencies; the repository mail list would somehow get a
submission note that listed the complete depends graph of that component.
-the repository analyzer has some (extensible) rules about generally
"bad" dependencies, those that should be flagged with a warning. Eg
junit.jar in the runtime, any of the xml implementations in there
(rather than just the stax/xml-apis api imports, use of commons-logging
over commons-logging-api".
-flag appearance of strongly-deprecated versions of things. e.g.
junit-3.7, anything else that is not in modern use and/or with security
holes.
-scan the artifacts to see which packages they publish; store a list of
all classes. Then scan their imports to see what they explicitly import.
Warn when something they import isnt published by anything they even
optionally depend upon.
we could have some fun there, given the appropriate amount of spare
time. I quite like the idea of .class level validation...
-steve
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]