Wendy Smoak wrote:
Does Archiva still use Lucene for indexing?  I can't find the
configuration for the index directory in a reasonably recent version.
It used to be <indexPath> in archiva.xml.

If so, where is the index stored now? (And how do I move it?)

Thanks,

Archiva 1.0 and index file storage.

How it works:

There are 2 modules in play.
archiva-indexer
archiva-configuration


In archiva-indexer there is an interface called...
  org.apache.maven.archiva.indexer.RepositoryContentIndexFactory
With an default implementation ...
  org.apache.maven.archiva.indexer.lucene.LuceneRepositoryContentIndexFactory  (xref)
which is responsible for setting up the various indexes.

The index is per-repository (this is intentional, to allow for a security around the repository to allow/deny a search based on roles, etc...)

The index directory is calculated using the following chunk of code (found in LuceneRepositoryContentIndexFactory)

    /**
     * Obtain the index directory for the provided repository. 
     * 
     * @param repository the repository to obtain the index directory from.
     * @param indexId the id of the index
     * @return the directory to put the index into.
     */
    private File toIndexDir( ArchivaRepository repository, String indexId )
    {
        if ( !repository.isManaged() )
        {
            throw new IllegalArgumentException( "Only supports managed repositories." );
        }

        // Attempt to get the specified indexDir in the configuration first.
        RepositoryConfiguration repoConfig = configuration.getConfiguration().findRepositoryById( repository.getId() );
        File indexDir;

        if ( repoConfig == null )
        {
            // No configured index dir, use the repository path instead.
            String repoPath = repository.getUrl().getPath();
            indexDir = new File( repoPath, ".index/" + indexId + "/" );
        }
        else
        {
            // Use configured index dir.
            String repoPath = repoConfig.getIndexDir();
            if ( StringUtils.isBlank( repoPath ) )
            {
                repoPath = repository.getUrl().getPath();
                if ( !repoPath.endsWith( "/" ) )
                {
                    repoPath += "/";
                }
                repoPath += ".index";
            }
            indexDir = new File( repoPath, "/" + indexId + "/" );
        }

        return indexDir;
    }
This means that if the indexDir is specified in the configuration for this repository, it uses that as the 'topmost' directory for the indexes.

There are 3 types of indexes.
  • Bytecode - Holds the classnames, public method signatures, and packages names.
  • FileContent - holds a raw file content index (for those files flagged as 'indexable' in the repository scan)
  • Hashcode - holds the file reference, as well as artifact reference, complete with md5 and sha1 hashcodes.
(Some $HOME/.m2/archiva.xml examples)
Lets look at what happens with a simple repository definition.
    <repository>
      <id>snapshots</id>
      <name>Managed Snapshots Repository</name>
      <url>file:/home/joakim/java/archiva/snapshots/</url>
      <snapshots>true</snapshots>
    </repository>
This is will result in the following directory structure ...

[EMAIL PROTECTED] .index]$ pwd
/home/joakim/java/archiva/snapshots/.index
[EMAIL PROTECTED] .index]$ ls -la
total 16
drwxr-xr-x 4 joakim joakim 4096 2007-05-29 15:45 .
drwxr-xr-x 6 joakim joakim 4096 2007-05-25 14:34 ..
drwxr-xr-x 2 joakim joakim 4096 2007-05-29 16:20 bytecode
drwxr-xr-x 2 joakim joakim 4096 2007-05-29 15:59 filecontent
drwxr-xr-x 2 joakim joakim 4096 2007-05-29 16:10 hashcodes
[EMAIL PROTECTED] .index]$


Notice, this is the default directory that is created for the lucene indexes.
It can be changed, observe the following setting (found only in the archiva.xml file ATM, oddly, no GUI for this setting yet exists.  Not sure how that slipped thru)
    <repository>
      <id>corporate</id>
      <name>Managed Corporate Repository</name>
      <url>file:/home/joakim/java/archiva/corporate/</url>
      <indexDir>/opt/indexes/corporate/</indexDir>
    </repository>
This sets the index directory for the corporate repository to be in /opt/indexes/corporate/ .
Remember, this is only a base directory for the indexes.  Let Archiva and Lucene manage the individual directories under this base directory.

Hope this helps.

Hmm.  This should be put in the documentation. O:-)
(flags this as a todo, and continues work on MRM-410)
-- 
- Joakim Erdfelt
  Committer and PMC Member, Apache Maven
  Archiva Developer
  [EMAIL PROTECTED]

Reply via email to