I've only taken a quick look. I do like the idea of gathering more
information so this is a good avenue to pursue.
I have some really fundamental questions first though, so help me out:
- how would this replace scanning if it only deals with JARs?
- what are the assumptions that have been proven to be misguided that
you refer to?
- I haven't looked at the code, but my initial reaction was that it
either overlaps or copies a lot from current archiva classes or maven-
shared-jar - is that correct or am I missing something?
When I think about collecting more information about artifacts, I
really think about decorators that can be added (so you can add your
own). Wouldn't this be best done as a small set of those? (which could
be integrated into trunk now - in fact it was something I wanted to
look at tomorrow to add the new archetype descriptor indexing now that
its released)
Thanks,
Brett
On 11/03/2008, at 3:25 PM, Joakim Erdfelt wrote:
I've been working on and off on a few different archiva related
tools / tasks / libs.
Brett and Wendy convinced me to upload what I got and outline what
I've got in mind to let the creative juices flow. (besides, I'm
running out of time to commit to archiva, so this work will be slow
to progress if i do it alone).
Concept: archiva-jarinfo.
A library for jar indexing / searching / identification for local
repositories, arbitrary directories of jars, and even remote
repositories.
For use by ...
* Archiva itself as a possible replacement for repository scanning,
indexing, and searching.
(Searching on checksums, filenames, classnames, imports,
identification fields, and even public / exposed methods)
* Archiva RepoMan WebStart Tool - a tool I've been wanting to help
identify and upload content to an Archiva repository.
* Archiva Maven Plugin - imagine typing $ mvn archiva:search -
Dquery=Logger and getting hits on
log4j, slf4j, commons-logging, plexus-logging, etc... found from
results from local repository and remote repository.
* Q4E integration - adding some ability to q4e to search local
repository and remote repositories for dependencies.
Some details.
(Some of this exists and works, Some of it does not, remember this
is a Work in Progress)
The existing repository scanning / indexing in Archiva server makes
some assumptions that have proven to be misguided (such as only
searching for new content based on timestamp). The new approach
that archiva-jarinfo takes is to mitigate the time consuming part of
the scan that the new content timestamp check attempts to avoid, the
processing of the jar file.
This is done by checking for a new xml file with the contents of the
jar file (called ${artifact}-${version}.jarinfo), if the file
exists, it's up to date, if it doesn't exist, the jar details are
collected and the jarinfo file is created.
I've seen this useful if you sync or copy repository directories
too. as the jarinfo files come along for the ride and reduce the
requirements for archiva to determine the jar details yet again.
The scan creates a Jar Info Bundle (*.jib file) that is just a jar
file with all of the *.jarinfo xml files in it, for consumption by
remote JarInfo clients to use for indexing purposes.
The JarInfo client uses the JarInfo lib to create an index for
checksums, jar content filenames, and public/exposed bytecode
information.
The JarInfo client can search local repos, remote repos, and even
arbitrary directories of jar files.
The JarInfo client can take an anonymous Jar file and perform a
series of identification checks in an attempt to identify the Jar
file based on jar file contents, and even similarity to jar files
found in the JarInfo indexes.
That's all the info I can squeeze out tonite, hopefully someone else
will find this useful.
Thanks,
- Joakim
--
Brett Porter
[EMAIL PROTECTED]
http://blogs.exist.com/bporter/