[jira] Created: (LUCENE-1292) Tag Index

Jason Rutherglen (JIRA) Wed, 21 May 2008 07:08:19 -0700

Tag Index
---------

                 Key: LUCENE-1292
                 URL: https://issues.apache.org/jira/browse/LUCENE-1292
             Project: Lucene - Java
          Issue Type: New Feature
          Components: Index
    Affects Versions: 2.3.1
            Reporter: Jason Rutherglen



The problem the tag index solves is slow field cache loading and range queries, 
and reindexing an entire document to update fields that are not tokenized.  

The tag index holds untokenized terms with a docfreq of 1 in a term dictionary 
like index file.  The file also stores the docs per term, similar to 
LUCENE-1278.  The index also has a transaction log and in memory index for 
realtime updates to the tags.  The transaction log is periodically merged into 
the existing tag term dictionary index file.

The TagIndexReader extends IndexReader and is unified with a regular index by 
ParallelReader.  There is a doc id to terms skip pointer file for the 
IndexReader.document method.  This file contains a pointer for looking up the 
terms for a document.  

There is a higher level class that encapsulates writing a document with tag 
fields to IndexWriter and TagIndexWriter.  This requires a hook into 
IndexWriter to coordinate doc ids and flushing segments to disk.  

The writer class could be as simple as:
{code}
public class TagIndexWriter {
  
  public void add(Term term, DocIdSetIterator iterator) {
  }
  
  public void delete(Term term, DocIdSetIterator iterator) {
  }
}
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-1292) Tag Index

Reply via email to