Perhaps this is already in existence somewhere.  If so please point me in
the right direction.

I want to know what the most popular dependancies are, not based on
downloads, but based on dependancies from other projects.
I want to explore the full dependency graph and see its evolution over
'time' (for instance seeing how fast versions of artifacts are adopted).
I want to create a visual representations of all the dependancies just
because it would look cool.

In general I want total access to all the metadata (pom files essentially)
in the maven central repo, so I can see how the worlds software fits
together on a 'global' scale.

Eventually I would like to explore the jar artifacts as well to get deeper
insights into what methods/classes are being referenced as well, but that
is phase 2. :)

>From googling around is appears that understandably it is improper to
simply wget the entire repo.  However, there don't seem to be any publicly
available torrents, or other resources for me to get access to this data.

http://search.maven.org/#stats

457GB is a lot of data, but it isn't an unimaginable amount, and most of
that is no doubt the artifacts, not the metadata (pom files).

So I really have two questions:

1. What is the easiest path to getting rsync type access of the full repo
(I'd quite understand if I needed to pay a fee for this level of access).
2. Failing that, what would be a legitimate way of just getting all the pom
files?

Basically I want to be a good guy and not put undo load on the servers, but
at the same time I really want the data.

Thanks,

Matt Taylor
http://blog.matthewjosephtaylor.com

Reply via email to