RE: Ocean Documentation

Ard Schrijvers Tue, 15 Jul 2008 03:39:35 -0700

> Jason  Rutherglen wrote:
> I took a look at Jackrabbit, which are a very cool animal,


:-)

> and there are similar ideas in the Lucene portion.  I will 
> try to take a look at the source to get a better understanding.  

The Jackrabbit indexing code is pretty much tied to Jackrabbit and
jsr-170 though, so a large portion of the code is wrt resolving
hierarchies/xpath/sql queries. I suppose Ocean is much more generic and
reusable, though, the concept about needing instant reflection of
changes in search results is the same

-Ard

> 
> 
> On Fri, Jul 11, 2008 at 9:09 AM, Ard Schrijvers 
> <[EMAIL PROTECTED]> wrote:
> 
> 
>       Hello Jason et al,
>       
>       Indeed there are plenty of usecases of instantly needed updated
>       searches, for example the jsr-170 (jcr) compliant Jackrabbit
>       implementation: it havily relies on lucene for 
> searching and hierarchy
>       resolving, and according jsr-170 spec after a save(), 
> changes need to be
>       visible instantly.
>       
>       Also, I think a very similar solution to yours is 
> implemented there: See
>       [1] if you like
>       
>       Regards Ard
>       
>       [1] http://jackrabbit.apache.org/index-readers.html
>       
> 
> 
> 
>       > I started a wiki name at
>       > http://wiki.apache.org/lucene-java/OceanRealtimeSearch linked
>       > from http://wiki.apache.org/lucene-java/LuceneResources.
>       >
>       > Perhaps I should add some background on the wiki.  I can add
>       > a little bit here.  I was an early Solr developer/user at a
>       > social networking company when Google's GData came out.  It
>       > looked similar to Solr so I took a look at it.  The one thing
>       > it had over Solr was realtime updates or the ability to add,
>       > delete, or update a document and be able to see the update in
>       > search results immediately.  With Solr the company had
>       > decided on a 10 minute interval of updating the index with
>       > delta updates from an Oracle database.  I wanted to see if it
>       > was possible with Lucene to create an approximation of what
>       > GData does.  The result is Ocean.
>       >
>       > The use case it was designed for is websites with dynamic
>       > data, some of which are social networking, photo sites,
>       > discussions boards, blogs, wikis, and such.  More broadly it
>       > is possible to use Ocean with any application that requires
>       > the database like feature of immediate updates.  Probably the
>       > best example of this is all of Google's web applications,
>       > outside of web search, uses a GData interface.  Meaning the
>       > primary datastore is not mysql or some equivalent, it is a
>       > proprietary search based database.  The best example of this
>       > is Gmail.  If I receive an email through Gmail I can also
>       > search on it immediately, there is no 10 minute delay.  Also
>       > in Gmail I can change labels, a common example being changing
>       > unread emails to read in bulk.  Presumably Gmail is not
>       > reindexing the entire email for each label change.
>       >
>       > Most highly trafficked web applications do not use the
>       > relational facilities like joins because they are too
>       > expensive.  Lucene does not offer joins so this is fine.  The
>       > only area Lucene is currently weak in is range queries.
>       > Mysql uses a btree index whereas Lucene uses the time
>       > consuming TermEnum and TermDocs combination.  This is an area
>       > Tag Index addresses.
>       >
>       > The way Ocean is designed there should be no limitations to
>       > using it compared to using Lucene IndexWriter.  It offers the
>       > same functionality.  If one does not want to use the
>       > transaction log Ocean offers because one simply wants to
>       > index 1 million documents at once, Ocean offers what is a
>       > called a LargeBatch.  It is a way to perform a large number
>       > of updates taking advantage of the new IndexWriter speedup,
>       > combined with transactional semantics.
>       >
>       > Karl, does this answer your question or are there areas that
>       > could use more explanation?
>       >
>       >
>       > On Fri, Jul 11, 2008 at 6:20 AM, Karl Wettin
>       > <[EMAIL PROTECTED]> wrote:
>       >
>       >
>       >
>       >       10 jul 2008 kl. 22.08 skrev Jason Rutherglen:
>       >
>       >
>       >
>       >               Is there a good place to put Ocean
>       > https://issues.apache.org/jira/browse/LUCENE-1313
>       > documentation?  Is there a place on the wiki that is good?
>       >
>       >
>       >
>       >       Hi Janson,
>       >
>       >       the wiki is just fine.
>       >
>       >       I've been reading the docs and looked at your patch.
>       > There is a lot of text about how it does what it does, but it
>       > says nothing anything about the intended use. I honestly
>       > don't even know what you mean by "real time search". You will
>       > probably get more attention if the documentation starts out
>       > with some use cases or thoughts on when and why it might make
>       > sense to use your code.
>       >
>       >
>       >             karl
>       >
>       >
>       > 
> ---------------------------------------------------------------------
>       >       To unsubscribe, e-mail: 
> [EMAIL PROTECTED]
>       >       For additional commands, e-mail: 
> [EMAIL PROTECTED]
>       >
>       >
>       >
>       >
>       >
>       
>       
> ---------------------------------------------------------------------
>       To unsubscribe, e-mail: [EMAIL PROTECTED]
>       For additional commands, e-mail: [EMAIL PROTECTED]
>       
>       
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Ocean Documentation

Reply via email to