[jira] Commented: (LUCENE-1313) Ocean Realtime Search

Karl Wettin (JIRA) Tue, 02 Sep 2008 02:07:38 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627612#action_12627612
 ]


Karl Wettin commented on LUCENE-1313:
-------------------------------------

Hi Jason,

I took an inital look at your code last night. Didn't actually execute 
anything, just followed method calls around to see what it was up to.

My first comment is sort of boring, but there are virtually no javadocs for the 
core classes such as TransactionSystem, Batch and Index. It would be great if 
there was a bit at the class level exaplaining what classes they interact with 
and how.  It would also be very helpful if there was method level javadocs for 
at least the top level commit related logic.

One thing that early cought my attention is this method in TransactionSystem:
{code:java}
  public OceanSearcher getSearcher() throws IOException {
    Snapshot snapshot = snapshots.getLatestSnapshot();
    if (searcherPolicy instanceof SingleThreadSearcherPolicy) {
      return new OceanSearcher(snapshot);
    } else {
      return new OceanMultiThreadSearcher(snapshot, searchThreadPool);
    }
  }
{code}

Am I supposed to call this method for each query (as suggested by the method 
name) or is this a factory method used to update my own Searcher instance after 
committing documents to the index (as suggested by the code)? 

It's not such a big deal, but I personally think you should refactor the 
instanceOf to a Policy.searcherFactory method, or perhaps even a 
SearcherPolicyVisitor. Actually, this goes for a few other places in the module 
too: you have used instanceOf and unchecked casting a bit more extensive to 
solve problems than what I would have. But as it does not seem to be used in 
places where it would be a costrly thing to do these comments are mearly about 
code readability and gut feelings about future problems. 

I'm a bit concerned about the potential loss of data while documents only 
resides in InstantiatedIndex or RAMDirectory. I think I'd like an option on 
some sort of transaction log that could be played up in case of a crash. I 
think the easiset way would be to convert all documents to be pre-analyzed 
(field.tokenStream) before passing them on to the instantiated writer. I don't 
know how much resources that might consume, but it would make me feel safer. 


     karl

> Ocean Realtime Search
> ---------------------
>
>                 Key: LUCENE-1313
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1313
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>         Attachments: lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, 
> lucene-1313.patch
>
>
> Provides realtime search using Lucene.  Conceptually, updates are divided 
> into discrete transactions.  The transaction is recorded to a transaction log 
> which is similar to the mysql bin log.  Deletes from the transaction are made 
> to the existing indexes.  Document additions are made to an in memory 
> InstantiatedIndex.  The transaction is then complete.  After each transaction 
> TransactionSystem.getSearcher() may be called which allows searching over the 
> index including the latest transaction.
> TransactionSystem is the main class.  Methods similar to IndexWriter are 
> provided for updating.  getSearcher returns a Searcher class. 
> - getSearcher()
> - addDocument(Document document)
> - addDocument(Document document, Analyzer analyzer)
> - updateDocument(Term term, Document document)
> - updateDocument(Term term, Document document, Analyzer analyzer)
> - deleteDocument(Term term)
> - deleteDocument(Query query)
> - commitTransaction(List<Document> documents, Analyzer analyzer, List<Term> 
> deleteByTerms, List<Query> deleteByQueries)
> Sample code:
> {code}
> // setup
> FSDirectoryMap directoryMap = new FSDirectoryMap(new File("/testocean"), 
> "log");
> LogDirectory logDirectory = directoryMap.getLogDirectory();
> TransactionLog transactionLog = new TransactionLog(logDirectory);
> TransactionSystem system = new TransactionSystem(transactionLog, new 
> SimpleAnalyzer(), directoryMap);
> // transaction
> Document d = new Document();
> d.add(new Field("contents", "hello world", Field.Store.YES, 
> Field.Index.TOKENIZED));
> system.addDocument(d);
> // search
> OceanSearcher searcher = system.getSearcher();
> ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs;
> System.out.println(hits.length + " total results");
> for (int i = 0; i < hits.length && i < 10; i++) {
>   Document d = searcher.doc(hits[i].doc);
>   System.out.println(i + " " + hits[i].score+ " " + d.get("contents");
> }
> {code}
> There is a test class org.apache.lucene.ocean.TestSearch that was used for 
> basic testing.  
> A sample disk directory structure is as follows:
> |/snapshot_105_00.xml | XML file containing which indexes and their 
> generation numbers correspond to a snapshot.  Each transaction creates a new 
> snapshot file.  In this file the 105 is the snapshotid, also known as the 
> transactionid.  The 00 is the minor version of the snapshot corresponding to 
> a merge.  A merge is a minor snapshot version because the data does not 
> change, only the underlying structure of the index|
> |/3 | Directory containing an on disk Lucene index|
> |/log | Directory containing log files|
> |/log/log00000001.bin | Log file.  As new log files are created the suffix 
> number is incremented|

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1313) Ocean Realtime Search

Reply via email to