[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Rutherglen updated LUCENE-1313: ------------------------------------- Attachment: lucene-1313.patch lucene-1313.patch Depends on LUCENE-1312 and LUCENE-1314. More bugs fixed. Deletes are committed to indexes only intermittently which improves the update speed dramatically. MaybeMergeIndexes now runs via a background timer. Will remove writing a snapshot.xml file per transaction in favor of a human readable log. Creating and deleting these small files is a bottleneck for update speed. This way a transaction writes to 2 files only. The merges happen in the background and so never affect the transaction update speed. I am not sure how useful it would be, but it is possible to have a priority based IO system that favors transactions over merges. If a transaction is coming in and a merge is happening to disk, the merge is stopped and the transaction IO runs, then the merge IO continues. I am not sure how to handle Documents with Fields that have a TokenStream as the value as I believe these cannot be serialized. For now I assume it will be unsupported. Also not sure how to handle analyzers, are these generally serializable? It would be useful to serialize them for a more automated log recovery process. > Ocean Realtime Search > --------------------- > > Key: LUCENE-1313 > URL: https://issues.apache.org/jira/browse/LUCENE-1313 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/* > Reporter: Jason Rutherglen > Attachments: lucene-1313.patch, lucene-1313.patch, lucene-1313.patch > > > Provides realtime search using Lucene. Conceptually, updates are divided > into discrete transactions. The transaction is recorded to a transaction log > which is similar to the mysql bin log. Deletes from the transaction are made > to the existing indexes. Document additions are made to an in memory > InstantiatedIndex. The transaction is then complete. After each transaction > TransactionSystem.getSearcher() may be called which allows searching over the > index including the latest transaction. > TransactionSystem is the main class. Methods similar to IndexWriter are > provided for updating. getSearcher returns a Searcher class. > - getSearcher() > - addDocument(Document document) > - addDocument(Document document, Analyzer analyzer) > - updateDocument(Term term, Document document) > - updateDocument(Term term, Document document, Analyzer analyzer) > - deleteDocument(Term term) > - deleteDocument(Query query) > - commitTransaction(List<Document> documents, Analyzer analyzer, List<Term> > deleteByTerms, List<Query> deleteByQueries) > Sample code: > {code} > // setup > FSDirectoryMap directoryMap = new FSDirectoryMap(new File("/testocean"), > "log"); > LogDirectory logDirectory = directoryMap.getLogDirectory(); > TransactionLog transactionLog = new TransactionLog(logDirectory); > TransactionSystem system = new TransactionSystem(transactionLog, new > SimpleAnalyzer(), directoryMap); > // transaction > Document d = new Document(); > d.add(new Field("contents", "hello world", Field.Store.YES, > Field.Index.TOKENIZED)); > system.addDocument(d); > // search > OceanSearcher searcher = system.getSearcher(); > ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs; > System.out.println(hits.length + " total results"); > for (int i = 0; i < hits.length && i < 10; i++) { > Document d = searcher.doc(hits[i].doc); > System.out.println(i + " " + hits[i].score+ " " + d.get("contents"); > } > {code} > There is a test class org.apache.lucene.ocean.TestSearch that was used for > basic testing. > A sample disk directory structure is as follows: > |/snapshot_105_00.xml | XML file containing which indexes and their > generation numbers correspond to a snapshot. Each transaction creates a new > snapshot file. In this file the 105 is the snapshotid, also known as the > transactionid. The 00 is the minor version of the snapshot corresponding to > a merge. A merge is a minor snapshot version because the data does not > change, only the underlying structure of the index| > |/3 | Directory containing an on disk Lucene index| > |/log | Directory containing log files| > |/log/log00000001.bin | Log file. As new log files are created the suffix > number is incremented| -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]