[jira] Updated: (LUCENE-1313) Ocean Realtime Search

Jason Rutherglen (JIRA) Tue, 24 Jun 2008 14:33:44 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Rutherglen updated LUCENE-1313:
-------------------------------------

    Attachment: lucene-1313.patch

lucene-1313.patch

Depends on LUCENE-1312 and LUCENE-1314.  More bugs fixed.  Deletes are 
committed to indexes only intermittently which improves the update speed 
dramatically.   MaybeMergeIndexes now runs via a background timer. 

Will remove writing a snapshot.xml file per transaction in favor of a human 
readable log.  Creating and deleting these small files is a bottleneck for 
update speed.  This way a transaction writes to 2 files only.  The merges 
happen in the background and so never affect the transaction update speed.  I 
am not sure how useful it would be, but it is possible to have a priority based 
IO system that favors transactions over merges.  If a transaction is coming in 
and a merge is happening to disk, the merge is stopped and the transaction IO 
runs, then the merge IO continues.  

I am not sure how to handle Documents with Fields that have a TokenStream as 
the value as I believe these cannot be serialized.  For now I assume it will be 
unsupported.  

Also not sure how to handle analyzers, are these generally serializable?  It 
would be useful to serialize them for a more automated log recovery process.

> Ocean Realtime Search
> ---------------------
>
>                 Key: LUCENE-1313
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1313
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>         Attachments: lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Provides realtime search using Lucene.  Conceptually, updates are divided 
> into discrete transactions.  The transaction is recorded to a transaction log 
> which is similar to the mysql bin log.  Deletes from the transaction are made 
> to the existing indexes.  Document additions are made to an in memory 
> InstantiatedIndex.  The transaction is then complete.  After each transaction 
> TransactionSystem.getSearcher() may be called which allows searching over the 
> index including the latest transaction.
> TransactionSystem is the main class.  Methods similar to IndexWriter are 
> provided for updating.  getSearcher returns a Searcher class. 
> - getSearcher()
> - addDocument(Document document)
> - addDocument(Document document, Analyzer analyzer)
> - updateDocument(Term term, Document document)
> - updateDocument(Term term, Document document, Analyzer analyzer)
> - deleteDocument(Term term)
> - deleteDocument(Query query)
> - commitTransaction(List<Document> documents, Analyzer analyzer, List<Term> 
> deleteByTerms, List<Query> deleteByQueries)
> Sample code:
> {code}
> // setup
> FSDirectoryMap directoryMap = new FSDirectoryMap(new File("/testocean"), 
> "log");
> LogDirectory logDirectory = directoryMap.getLogDirectory();
> TransactionLog transactionLog = new TransactionLog(logDirectory);
> TransactionSystem system = new TransactionSystem(transactionLog, new 
> SimpleAnalyzer(), directoryMap);
> // transaction
> Document d = new Document();
> d.add(new Field("contents", "hello world", Field.Store.YES, 
> Field.Index.TOKENIZED));
> system.addDocument(d);
> // search
> OceanSearcher searcher = system.getSearcher();
> ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs;
> System.out.println(hits.length + " total results");
> for (int i = 0; i < hits.length && i < 10; i++) {
>   Document d = searcher.doc(hits[i].doc);
>   System.out.println(i + " " + hits[i].score+ " " + d.get("contents");
> }
> {code}
> There is a test class org.apache.lucene.ocean.TestSearch that was used for 
> basic testing.  
> A sample disk directory structure is as follows:
> |/snapshot_105_00.xml | XML file containing which indexes and their 
> generation numbers correspond to a snapshot.  Each transaction creates a new 
> snapshot file.  In this file the 105 is the snapshotid, also known as the 
> transactionid.  The 00 is the minor version of the snapshot corresponding to 
> a merge.  A merge is a minor snapshot version because the data does not 
> change, only the underlying structure of the index|
> |/3 | Directory containing an on disk Lucene index|
> |/log | Directory containing log files|
> |/log/log00000001.bin | Log file.  As new log files are created the suffix 
> number is incremented|

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1313) Ocean Realtime Search

Reply via email to