Yes, both Marcelo and I would be interested. We looked into H2 and it looks like something similar to Oracle's ODCI can be implemented. Plus the primitive full-text implementación is based on Lucene. I say primitive because looking at the code I saw that one cannot define an Analyzer and for each scan corresponding to a where clause a searcher is open and closed, instead of having a pool, plus it does not have any way to queue changes to reduce the use of the IndexWriter, etc.
But its open source and that is a great starting point! -- Joaquin On Mon, Sep 8, 2008 at 2:05 PM, Jason Rutherglen <[EMAIL PROTECTED] > wrote: > Perhaps an interesting project would be to integrate Ocean with H2 > www.h2database.com to take advantage of both models. I'm not sure how > exactly that would work, but it seems like it would not be too > difficult. Perhaps this would solve being able to perform faster > hierarchical queries and perhaps other types of queries that Lucene is > not capable of. > > Is this something Joaquin you are interested in collaborating on? I > am definitely interested in it. > > On Sun, Sep 7, 2008 at 4:04 AM, J. Delgado <[EMAIL PROTECTED]> > wrote: > > On Sat, Sep 6, 2008 at 1:36 AM, Otis Gospodnetic > > <[EMAIL PROTECTED]> wrote: > >> > >> Regarding real-time search and Solr, my feeling is the focus should be > on > >> first adding real-time search to Lucene, and then we'll figure out how > to > >> incorporate that into Solr later. > > > > > > Otis, what do you mean exactly by "adding real-time search to Lucene"? > Note > > that Lucene, being a indexing/search library (and not a full blown search > > engine), is by definition "real-time": once you add/write a document to > the > > index it becomes immediately searchable and if a document is logically > > deleted and no longer returned in a search, though physical deletion > happens > > during an index optimization. > > > > Now, the problem of adding/deleting documents in bulk, as part of a > > transaction and making these documents available for search immediately > > after the transaction is commited sounds more like a search engine > problem > > (i.e. SOLR, Nutch, Ocean), specially if these transactions are known to > be > > I/O expensive and thus are usually implemented bached proceeses with some > > kind of sync mechanism, which makes them non real-time. > > > > For example, in my previous life, I designed and help implement a > > quasi-realtime enterprise search engine using Lucene, having a set of > > multi-threaded indexers hitting a set of multiple indexes alocatted > accross > > different search services which powered a broker based distributed search > > interface. The most recent documents provided to the indexers were always > > added to the smaller in-memory (RAM) indexes which usually could absorbe > the > > load of a bulk "add" transaction and later would be merged into larger > disk > > based indexes and then flushed to make them ready to absorbe new fresh > docs. > > We even had further partitioning of the indexes that reflected time > periods > > with caps on size for them to be merged into older more archive based > > indexes which were used less (yes the search engine default search was on > > data no more than 1 month old, though user could open the time window by > > including archives). > > > > As for SOLR and OCEAN, I would argue that these semi-structured search > > engines are becomming more and more like relational databases with > full-text > > search capablities (without the benefit of full reletional algebra -- for > > example joins are not possible using SOLR). Notice that "real-time" CRUD > > operations and transactionality are core DB concepts adn have been > studied > > and developed by database communities for aquite long time. There has > been > > recent efforts on how to effeciently integrate Lucene into releational > > databases (see Lucene JVM ORACLE integration, see > > > http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html > ) > > > > I think we should seriously look at joining efforts with open-source > > Database engine projects, written in Java (see > > http://java-source.net/open-source/database-engines) in order to blend > IR > > and ORM for once and for all. > > > > -- Joaquin > > > > > >> > >> I've read Jason's Wiki as well. Actually, I had to read it a number of > >> times to understand bits and pieces of it. I have to admit there is > still > >> some fuzziness about the whole things in my head - is "Ocean" something > that > >> already works, a separate project on googlecode.com? I think so. If > so, > >> and if you are working on getting it integrated into Lucene, would it > make > >> it less confusing to just refer to it as "real-time search", so there is > no > >> confusion? > >> > >> If this is to be initially integrated into Lucene, why are things like > >> replication, crowding/field collapsing, locallucene, name service, tag > >> index, etc. all mentioned there on the Wiki and bundled with description > of > >> how real-time search works and is to be implemented? I suppose > mentioning > >> replication kind-of makes sense because the replication approach is > closely > >> tied to real-time search - all query nodes need to see index changes > fast. > >> But Lucene itself offers no replication mechanism, so maybe the > replication > >> is something to figure out separately, say on the Solr level, later on > "once > >> we get there". I think even just the essential real-time search > requires > >> substantial changes to Lucene (I remember seeing large patches in JIRA), > >> which makes it hard to digest, understand, comment on, and ultimately > commit > >> (hence the luke warm response, I think). Bringing other non-essential > >> elements into discussion at the same time makes it more difficult t o > >> process all this new stuff, at least for me. Am I the only one who > finds > >> this hard? > >> > >> That said, it sounds like we have some discussion going (Karl...), so I > >> look forward to understanding more! :) > >> > >> > >> Otis > >> -- > >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >> > >> > >> > >> ----- Original Message ---- > >> > From: Yonik Seeley <[EMAIL PROTECTED]> > >> > To: [email protected] > >> > Sent: Thursday, September 4, 2008 10:13:32 AM > >> > Subject: Re: Realtime Search for Social Networks Collaboration > >> > > >> > On Wed, Sep 3, 2008 at 6:50 PM, Jason Rutherglen > >> > wrote: > >> > > I also think it's got a > >> > > lot of things now which makes integration difficult to do properly. > >> > > >> > I agree, and that's why the major bump in version number rather than > >> > minor - we recognize that some features will need some amount of > >> > rearchitecture. > >> > > >> > > I think the problem with integration with SOLR is it was designed > with > >> > > a different problem set in mind than Ocean, originally the CNET > >> > > shopping application. > >> > > >> > That was the first use of Solr, but it actually existed before that > >> > w/o any defined use other than to be a "plan B" alternative to MySQL > >> > based search servers (that's actually where some of the parameter > >> > names come from... the default /select URL instead of /search, the > >> > "rows" parameter, etc). > >> > > >> > But you're right... some things like the replication strategy were > >> > designed (well, borrowed from Doug to be exact) with the idea that it > >> > would be OK to have slightly "stale" views of the data in the range of > >> > minutes. It just made things easier/possible at the time. But tons > >> > of Solr and Lucene users want almost instantaneous visibility of added > >> > documents, if they can get it. It's hardly restricted to social > >> > network applications. > >> > > >> > Bottom line is that Solr aims to be a general enterprise search > >> > platform, and getting as real-time as we can get, and as scalable as > >> > we can get are some of the top priorities going forward. > >> > > >> > -Yonik > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: [EMAIL PROTECTED] > >> > For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
