> Jason Rutherglen wrote: > I took a look at Jackrabbit, which are a very cool animal,
:-) > and there are similar ideas in the Lucene portion. I will > try to take a look at the source to get a better understanding. The Jackrabbit indexing code is pretty much tied to Jackrabbit and jsr-170 though, so a large portion of the code is wrt resolving hierarchies/xpath/sql queries. I suppose Ocean is much more generic and reusable, though, the concept about needing instant reflection of changes in search results is the same -Ard > > > On Fri, Jul 11, 2008 at 9:09 AM, Ard Schrijvers > <[EMAIL PROTECTED]> wrote: > > > Hello Jason et al, > > Indeed there are plenty of usecases of instantly needed updated > searches, for example the jsr-170 (jcr) compliant Jackrabbit > implementation: it havily relies on lucene for > searching and hierarchy > resolving, and according jsr-170 spec after a save(), > changes need to be > visible instantly. > > Also, I think a very similar solution to yours is > implemented there: See > [1] if you like > > Regards Ard > > [1] http://jackrabbit.apache.org/index-readers.html > > > > > > I started a wiki name at > > http://wiki.apache.org/lucene-java/OceanRealtimeSearch linked > > from http://wiki.apache.org/lucene-java/LuceneResources. > > > > Perhaps I should add some background on the wiki. I can add > > a little bit here. I was an early Solr developer/user at a > > social networking company when Google's GData came out. It > > looked similar to Solr so I took a look at it. The one thing > > it had over Solr was realtime updates or the ability to add, > > delete, or update a document and be able to see the update in > > search results immediately. With Solr the company had > > decided on a 10 minute interval of updating the index with > > delta updates from an Oracle database. I wanted to see if it > > was possible with Lucene to create an approximation of what > > GData does. The result is Ocean. > > > > The use case it was designed for is websites with dynamic > > data, some of which are social networking, photo sites, > > discussions boards, blogs, wikis, and such. More broadly it > > is possible to use Ocean with any application that requires > > the database like feature of immediate updates. Probably the > > best example of this is all of Google's web applications, > > outside of web search, uses a GData interface. Meaning the > > primary datastore is not mysql or some equivalent, it is a > > proprietary search based database. The best example of this > > is Gmail. If I receive an email through Gmail I can also > > search on it immediately, there is no 10 minute delay. Also > > in Gmail I can change labels, a common example being changing > > unread emails to read in bulk. Presumably Gmail is not > > reindexing the entire email for each label change. > > > > Most highly trafficked web applications do not use the > > relational facilities like joins because they are too > > expensive. Lucene does not offer joins so this is fine. The > > only area Lucene is currently weak in is range queries. > > Mysql uses a btree index whereas Lucene uses the time > > consuming TermEnum and TermDocs combination. This is an area > > Tag Index addresses. > > > > The way Ocean is designed there should be no limitations to > > using it compared to using Lucene IndexWriter. It offers the > > same functionality. If one does not want to use the > > transaction log Ocean offers because one simply wants to > > index 1 million documents at once, Ocean offers what is a > > called a LargeBatch. It is a way to perform a large number > > of updates taking advantage of the new IndexWriter speedup, > > combined with transactional semantics. > > > > Karl, does this answer your question or are there areas that > > could use more explanation? > > > > > > On Fri, Jul 11, 2008 at 6:20 AM, Karl Wettin > > <[EMAIL PROTECTED]> wrote: > > > > > > > > 10 jul 2008 kl. 22.08 skrev Jason Rutherglen: > > > > > > > > Is there a good place to put Ocean > > https://issues.apache.org/jira/browse/LUCENE-1313 > > documentation? Is there a place on the wiki that is good? > > > > > > > > Hi Janson, > > > > the wiki is just fine. > > > > I've been reading the docs and looked at your patch. > > There is a lot of text about how it does what it does, but it > > says nothing anything about the intended use. I honestly > > don't even know what you mean by "real time search". You will > > probably get more attention if the documentation starts out > > with some use cases or thoughts on when and why it might make > > sense to use your code. > > > > > > karl > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]