Jason van Zyl wrote:

On 4 Jul 06, at 1:45 PM 4 Jul 06, Steve Loughran wrote:

In a way, many of the stuff in M2 is experimental; a build tool that effectively encodes beliefs about how a project should be structured and delivered, focusing on component-based development instead of application dev. I also think its time to look at how well some of the experiment is working.


You make it sound like we're some sort of cult :-)

I think you are exploring cutting edge loosely coupled software development processes. It's research. Interesting, fun research, but research nonetheless. Just as Gump is an experiment in whether a unified nightly build changes people's working processes.

I've been hanging round with semantic-web people recently, and have devolved into using the word "belief" where they use "fact", because of differences of opinion on what they and I think RDF triples are (they think they're facts in a graph, I think every triple is a belief published by an entity at a particular moment in time). The nice thing about a belief-centric model is you get to accept the fact that different entities have different beliefs, and a single entity/agent can change its belief set, without ever having to worry about the fact that the global belief-set is inconsistent.

in real agent-oriented-runtimes (still very much academic research, even more than RDF engines), the resolver takes in to account the metadata about which agent issued a belief statement and when during its resolution process. Newer statements by the same entity can override older ones; differences between entities are allowable but result in ambiguities that may need to be dealt with further down the line.

When you apply the same agent-oriented view to POM metadata, you can say "a POM file represents the pom author's beliefs about the artifact's dependencies at the time they wrote the POM". It may be the beliefs match what the artifact really needs, it may be those beliefs turn out to be utterly wrong.

[interlude. I just grabbed the chair of the W3C RDF working group by the coffee machine. Apparently "a belief is a state of mind", "a fact is something that is believed". So all facts are beliefs, the only variable being the number of believers]

Because the ibiblio repository contains fact/belief metadata from so many sources, its that much harder to reconcile than those from single entities. The good news is that we do have a very nice way to test these assertions in java; running the program and seeing what classes get loaded. So when someone is utterly wrong in their dependencies, its pretty obvious. Its when they are slightly wrong, when they use some classes in certain cases, often using reflection to bind at run time, that you can get caught out.



The phrase "encoding beliefs" is an inaccurate description. It's is simply the pursuit of best practices for software development and those practices are very much mutable, this thread being very good evidence of that. We also not only focused on component-oriented development, we ourselves develop applications ourselves and we're trying to make that coherent as well.

Ok. how about "encoding the team's ideas and experience in how to build applications as sets of components, using
shared repositories to exchange components and their metadata"?


Personally, I always experience a bit of fear when adding a new dependency to a project. the repository stuff, and estimate a couple of hours to get every addition stable, primarily by building up a good exclusion list.

This is the place to talk about that as people shouldn't be fearful adding dependencies. But people who have an ideal setup here they completely control the repository they use internally don't have many of the problems that people are experiencing in this thread. Having a public repository of high quality is not a trivial task.


Is it worse than before? Better? Or just, well, different? and if things are either worse or not as good as they could be, what can be changed?


The process is absolutely better. The process couple with the public infrastructure we have now is problematic. Two very different things.

One underlying cause seems to be pom quality. Open source software dev is a vast collection of loosely coupled projects, and what we get in the repository in terms of metadata matches this model. Each project produces artifacts that match their immediate needs, with POM files that appear to work at the time of publishing. Maven then caches those and freezes that metadata forever, even if it turns out that the metadata was wrong. There's far better coherence within Gump, where the metadata is effectively maintained more by the gump team themselves than by the individual projects.

There is absolutely no way this is scalable over time. You are saying that a small group of people can maintain metadata for projects that they are not intimately involved with? That's like saying that people who live outside your community have a better chance at describing your community. I really just don't think that's possible. How many problems has Gump had over the years trying to maintain the metadata? Huge problems, almost never in sync with projects. You basically find out when it breaks and go back track most of the time. There is no doubt that the same process will happen with Maven where users of Maven will eventually make their metadata better but that will take time. Gump has been around for 5-6 years now. People are really only starting to use Maven 2.x which is closing in on being out for a year. I am will to bet in another year a great number of the problems seen in this thread will be gone. I would argue that Gump will not work precisely because it is not the projects themselves maintaining the metadata. Projects using Maven will eventually get it right because it provides some value to them to get it right.


Oh I agree, handwritten custom-coded stuff doesn't scale. That is the price with that model, and it makes it hard to use the same tools within your own build process. But it does support the low-hanging-fruit of things that depend on commons-logging yet who don't want logkit on their classpath.

Gump's problem is not just that the metadata is written by the gumpers, and not the projects, but that the projects don't always care if the build is broken. Getting someone to care about what happens to their stuff downstream is the first step to fixing the problem. As more m2 takeup occurs, you should get a lot of that feedback in the system, moving from the "please redist on the maven repository", to "please have good metadata", before finally, the joy of silence, as everything works.



Question is, what to do about it? And if the m2 repository was an attempt to leave the problems of the M1 repository behind, has it worked?


To a large extent I would say we have fixed many of the problems on a technical level. Correctly the metadata and educating projects as to how best maintain is it is a social problem and a matter of education. Couple that with some automated integrity checks that will be performed by the repository manager.

Yes, I think more rigorousness on accepting poms would be good. People, even apache projects, should not be able to submit an artifact to the repository without
-everything you depend on being there. No unresolvable artifacts.
-no dependencies on -SNAPSHOT. I know, apache projects arent meant to release in that mode, but Apache Muse managed it, with very bad consequences downstream. -a (manual) review of your dependencies. You, the submitter would get told your dependencies; the repository mail list would somehow get a submission note that listed the complete depends graph of that component. -the repository analyzer has some (extensible) rules about generally "bad" dependencies, those that should be flagged with a warning. Eg junit.jar in the runtime, any of the xml implementations in there (rather than just the stax/xml-apis api imports, use of commons-logging over commons-logging-api". -flag appearance of strongly-deprecated versions of things. e.g. junit-3.7, anything else that is not in modern use and/or with security holes. -scan the artifacts to see which packages they publish; store a list of all classes. Then scan their imports to see what they explicitly import. Warn when something they import isnt published by anything they even optionally depend upon.

we could have some fun there, given the appropriate amount of spare time. I quite like the idea of .class level validation...

-steve


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to