[jira] Updated: (LUCENE-1292) Tag Index

Jason Rutherglen (JIRA) Sat, 07 Jun 2008 10:30:09 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Rutherglen updated LUCENE-1292:
-------------------------------------

    Attachment: lucene-1292.patch

lucene-1292.patch

Basic class structure and file formats.  Stub for LRU cache.  Added DocState 
parameter to IndexWriter.addDocument to obtain docid.  Needs TagSegmentMerger 
class.  Package should possibly be moved to org.apache.lucene.index.tag.

Syncing with IndexWriter merging processes currently would be too complex 
because of the way IndexWriter handles deletes and updated documents in RAM 
sometimes before flushing.  The system I will use Tag Index with manages the 
segment merging process and the deletion of documents outside of IndexWriter.  

> Tag Index
> ---------
>
>                 Key: LUCENE-1292
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1292
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.3.1
>            Reporter: Jason Rutherglen
>         Attachments: lucene-1292.patch
>
>
> The problem the tag index solves is slow field cache loading and range 
> queries, and reindexing an entire document to update fields that are not 
> tokenized.  
> The tag index holds untokenized terms with a docfreq of 1 in a term 
> dictionary like index file.  The file also stores the docs per term, similar 
> to LUCENE-1278.  The index also has a transaction log and in memory index for 
> realtime updates to the tags.  The transaction log is periodically merged 
> into the existing tag term dictionary index file.
> The TagIndexReader extends IndexReader and is unified with a regular index by 
> ParallelReader.  There is a doc id to terms skip pointer file for the 
> IndexReader.document method.  This file contains a pointer for looking up the 
> terms for a document.  
> There is a higher level class that encapsulates writing a document with tag 
> fields to IndexWriter and TagIndexWriter.  This requires a hook into 
> IndexWriter to coordinate doc ids and flushing segments to disk.  
> The writer class could be as simple as:
> {code}
> public class TagIndexWriter {
>   
>   public void add(Term term, DocIdSetIterator iterator) {
>   }
>   
>   public void delete(Term term, DocIdSetIterator iterator) {
>   }
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1292) Tag Index

Reply via email to