On 24 Oct 06, at 3:18 PM 24 Oct 06, Brian Topping wrote:
With the snapshots repo down, there was some discussion on IRC.
Joakim mentioned there was some discussion of a resolution that was
"DNS-like, not actual DNS" and it got me thinking DNS might be a
better solution (possibly with RFC-2782 extensions) to resolve
repositories. Apologies if this echos discussion at ApacheCon.
This solves some problems:
1) Downtime at well-known repositories such as we are seeing today
could be backed by the actual repository that released the code.
If a central repository goes down, the source repository that
provided the original artifact would act as a fallback. DNS is
distributed, so there is no central point of failure for artifact
resolution.
If someone checks out your project and you specify your own
repositories then your repositories will be used. The central
repository provides convenience so that you don't have to specify any
repositories. We know there needs to be mirrors but first steps
first. The central repository is now hosted by Contegix so the
central repository is not likely to go down. What happened today
could be prevented by holding a time slice of snapshots on the
central repository which would make things far more convenient.
I am totally open to anyone finding mirrors for the central
repository. That is the first step, once we have then then there are
a number of things we could do like what Apache does itself in terms
of locating a mirror closest to the user so we might have a setup
like ca.repo.maven.org, za.repo.maven.org ... and whomever else wants
to donate some space. So the search logic could be embedded into
Maven on the client side, or the Contegix machine would do redirects.
The first step is finding other machines. I think the greatest
pattern for ease of use is just having Maven do the right thing and
find the artifacts and a replicated central repository in various
regions of the world would be best IMO. Once the initial rsync is
done, which can be expensive, the subsequent maintenance is
manageable. We want to make the repository infrastructure robust
which is why Contegix is involved.
2) Authenticity of artifacts is validated by control of DNS. The
current method of getting an artifact into the central repository
isn't scalable. If you know someone well enough, they put your
code into the repository. If you don't know someone, your request
gets put on a list of things to do. It's the way it has to work
with a central repo. But if the repo could be found by DNS
resolution, anyone could publish. It's up to the client to decide
if a jar with <groupId>org.viruswriters</groupId> is safe to depend
on, and it can be resolved without burdening central repository
maintainers to decide whether to publish it since the crew at
viruswriters.org could simply add their external repository to
DNS. Done.
The manual process has to go away, we know that and again you can
overcome this limitation by specifying your own repositories in your
POMs today. What should happen in the future is that once a project
is validated with a PGP key then we can take artifacts from that
project in an automated way forever more. We could even just take
their POMs and build the artifacts from source in a secure
environment. It's simply a matter of time but Archiva is clipping
along. If anyone wants talks here to make the automated submission of
artifacts a reality I've got a big list for you.
3) Use of artifacts could be logged. I would like to be able to
use log analyzers to know who is using my artifacts and what part
of the world they are coming from. I can't do this with a central
repository. If I have a sufficiently fast line, I should be able
to run my own repo and collect these logs.
Since the central repository has moved over to Contegix all artifact
use has been logged so we do have stats. Again if you want to write
something to analyse the logs for project I'll give you access to the
logs, the information is now being collected. I hope to actually
serve the repository itself with Jetty and create some special
handler to collect the information as artifacts are being downloaded
so we have realtime stats.
Central repositories are still important, but their role would
change to a fast cache of the distributed repos. Artifact
suppliers would be mirrored if their artifacts were considered
important enough and mirroring them would speed up builds for the
masses, not because someone successfully campaigned to get their
artifact in.
I think it's nice to wish that everyone is going to be able to
provide a high QoS but I just don't think that's going to happen. If
you want to provide users of your builds with repositories that you
host, again, you can do that. But what we need is a robust central
infrastructure that works for everything and that includes:
- many reliable mirrors
- an easy way to submit artifacts
- a convenient way to validate project
- an easy way to manage a project's credentials
But to start with anyone find us mirrors and we can start the
process. If anyone wants to do any of these pieces then feel free to
step up. I can keep you busy :-)
Jason.
WDYT?
-b
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]