[jira] Updated: (LUCENE-1313) Realtime Search

Jason Rutherglen (JIRA) Wed, 01 Apr 2009 14:07:35 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Rutherglen updated LUCENE-1313:
-------------------------------------

          Component/s:     (was: contrib/*)
                       Index
        Fix Version/s: 2.9
             Priority: Minor  (was: Major)
          Description: 
Realtime search with transactional semantics.  

Possible future directions:
  * Optimistic concurrency
  * Replication

Encoding each transaction into a set of bytes by writing to a RAMDirectory 
enables replication.  It is difficult to replicate using other methods because 
while the document may easily be serialized, the analyzer cannot.

I think this issue can hold realtime benchmarks which include indexing and 
searching concurrently.

  was:
Provides realtime search using Lucene.  Conceptually, updates are divided into 
discrete transactions.  The transaction is recorded to a transaction log which 
is similar to the mysql bin log.  Deletes from the transaction are made to the 
existing indexes.  Document additions are made to an in memory 
InstantiatedIndex.  The transaction is then complete.  After each transaction 
TransactionSystem.getSearcher() may be called which allows searching over the 
index including the latest transaction.

TransactionSystem is the main class.  Methods similar to IndexWriter are 
provided for updating.  getSearcher returns a Searcher class. 

- getSearcher()
- addDocument(Document document)
- addDocument(Document document, Analyzer analyzer)
- updateDocument(Term term, Document document)
- updateDocument(Term term, Document document, Analyzer analyzer)
- deleteDocument(Term term)
- deleteDocument(Query query)
- commitTransaction(List<Document> documents, Analyzer analyzer, List<Term> 
deleteByTerms, List<Query> deleteByQueries)

Sample code:

{code}
// setup
FSDirectoryMap directoryMap = new FSDirectoryMap(new File("/testocean"), "log");
LogDirectory logDirectory = directoryMap.getLogDirectory();
TransactionLog transactionLog = new TransactionLog(logDirectory);
TransactionSystem system = new TransactionSystem(transactionLog, new 
SimpleAnalyzer(), directoryMap);

// transaction
Document d = new Document();
d.add(new Field("contents", "hello world", Field.Store.YES, 
Field.Index.TOKENIZED));
system.addDocument(d);

// search
OceanSearcher searcher = system.getSearcher();
ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs;
System.out.println(hits.length + " total results");
for (int i = 0; i < hits.length && i < 10; i++) {
  Document d = searcher.doc(hits[i].doc);
  System.out.println(i + " " + hits[i].score+ " " + d.get("contents");
}
{code}

There is a test class org.apache.lucene.ocean.TestSearch that was used for 
basic testing.  

A sample disk directory structure is as follows:

|/snapshot_105_00.xml | XML file containing which indexes and their generation 
numbers correspond to a snapshot.  Each transaction creates a new snapshot 
file.  In this file the 105 is the snapshotid, also known as the transactionid. 
 The 00 is the minor version of the snapshot corresponding to a merge.  A merge 
is a minor snapshot version because the data does not change, only the 
underlying structure of the index|
|/3 | Directory containing an on disk Lucene index|
|/log | Directory containing log files|
|/log/log00000001.bin | Log file.  As new log files are created the suffix 
number is incremented|



    Affects Version/s: 2.4.1
              Summary: Realtime Search  (was: Ocean Realtime Search)

> Realtime Search
> ---------------
>
>                 Key: LUCENE-1313
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1313
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, 
> lucene-1313.patch, lucene-1313.patch
>
>
> Realtime search with transactional semantics.  
> Possible future directions:
>   * Optimistic concurrency
>   * Replication
> Encoding each transaction into a set of bytes by writing to a RAMDirectory 
> enables replication.  It is difficult to replicate using other methods 
> because while the document may easily be serialized, the analyzer cannot.
> I think this issue can hold realtime benchmarks which include indexing and 
> searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-1313) Realtime Search

Reply via email to