Agreed, it's a system that is of value to a subset of cases.
On Sat, Sep 20, 2008 at 4:04 PM, Noble Paul നോബിള് नोब्ळ् <[EMAIL PROTECTED]> wrote: > Moving back to RDBMS model will be a big step backwards where we miss > mulivalued fields and arbitrary fields . > > On Tue, Sep 9, 2008 at 4:17 AM, Jason Rutherglen > <[EMAIL PROTECTED]> wrote: >> Cool. I mention H2 because it does have some Lucene code in it yes. >> Also according to some benchmarks it's the fastest of the open source >> databases. I think it's possible to integrate realtime search for H2. >> I suppose there is no need to store the data in Lucene in this case? >> One loses the multiple values per field Lucene offers, and the schema >> become static. Perhaps it's a trade off? >> >> On Mon, Sep 8, 2008 at 6:17 PM, J. Delgado <[EMAIL PROTECTED]> wrote: >>> Yes, both Marcelo and I would be interested. >>> >>> We looked into H2 and it looks like something similar to Oracle's ODCI can >>> be implemented. Plus the primitive full-text implementación is based on >>> Lucene. >>> I say primitive because looking at the code I saw that one cannot define an >>> Analyzer and for each scan corresponding to a where clause a searcher is >>> open and closed, instead of having a pool, plus it does not have any way to >>> queue changes to reduce the use of the IndexWriter, etc. >>> >>> But its open source and that is a great starting point! >>> >>> -- Joaquin >>> >>> On Mon, Sep 8, 2008 at 2:05 PM, Jason Rutherglen >>> <[EMAIL PROTECTED]> wrote: >>>> >>>> Perhaps an interesting project would be to integrate Ocean with H2 >>>> www.h2database.com to take advantage of both models. I'm not sure how >>>> exactly that would work, but it seems like it would not be too >>>> difficult. Perhaps this would solve being able to perform faster >>>> hierarchical queries and perhaps other types of queries that Lucene is >>>> not capable of. >>>> >>>> Is this something Joaquin you are interested in collaborating on? I >>>> am definitely interested in it. >>>> >>>> On Sun, Sep 7, 2008 at 4:04 AM, J. Delgado <[EMAIL PROTECTED]> >>>> wrote: >>>> > On Sat, Sep 6, 2008 at 1:36 AM, Otis Gospodnetic >>>> > <[EMAIL PROTECTED]> wrote: >>>> >> >>>> >> Regarding real-time search and Solr, my feeling is the focus should be >>>> >> on >>>> >> first adding real-time search to Lucene, and then we'll figure out how >>>> >> to >>>> >> incorporate that into Solr later. >>>> > >>>> > >>>> > Otis, what do you mean exactly by "adding real-time search to Lucene"? >>>> > Note >>>> > that Lucene, being a indexing/search library (and not a full blown >>>> > search >>>> > engine), is by definition "real-time": once you add/write a document to >>>> > the >>>> > index it becomes immediately searchable and if a document is logically >>>> > deleted and no longer returned in a search, though physical deletion >>>> > happens >>>> > during an index optimization. >>>> > >>>> > Now, the problem of adding/deleting documents in bulk, as part of a >>>> > transaction and making these documents available for search immediately >>>> > after the transaction is commited sounds more like a search engine >>>> > problem >>>> > (i.e. SOLR, Nutch, Ocean), specially if these transactions are known to >>>> > be >>>> > I/O expensive and thus are usually implemented bached proceeses with >>>> > some >>>> > kind of sync mechanism, which makes them non real-time. >>>> > >>>> > For example, in my previous life, I designed and help implement a >>>> > quasi-realtime enterprise search engine using Lucene, having a set of >>>> > multi-threaded indexers hitting a set of multiple indexes alocatted >>>> > accross >>>> > different search services which powered a broker based distributed >>>> > search >>>> > interface. The most recent documents provided to the indexers were >>>> > always >>>> > added to the smaller in-memory (RAM) indexes which usually could absorbe >>>> > the >>>> > load of a bulk "add" transaction and later would be merged into larger >>>> > disk >>>> > based indexes and then flushed to make them ready to absorbe new fresh >>>> > docs. >>>> > We even had further partitioning of the indexes that reflected time >>>> > periods >>>> > with caps on size for them to be merged into older more archive based >>>> > indexes which were used less (yes the search engine default search was >>>> > on >>>> > data no more than 1 month old, though user could open the time window by >>>> > including archives). >>>> > >>>> > As for SOLR and OCEAN, I would argue that these semi-structured search >>>> > engines are becomming more and more like relational databases with >>>> > full-text >>>> > search capablities (without the benefit of full reletional algebra -- >>>> > for >>>> > example joins are not possible using SOLR). Notice that "real-time" CRUD >>>> > operations and transactionality are core DB concepts adn have been >>>> > studied >>>> > and developed by database communities for aquite long time. There has >>>> > been >>>> > recent efforts on how to effeciently integrate Lucene into releational >>>> > databases (see Lucene JVM ORACLE integration, see >>>> > >>>> > http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html) >>>> > >>>> > I think we should seriously look at joining efforts with open-source >>>> > Database engine projects, written in Java (see >>>> > http://java-source.net/open-source/database-engines) in order to blend >>>> > IR >>>> > and ORM for once and for all. >>>> > >>>> > -- Joaquin >>>> > >>>> > >>>> >> >>>> >> I've read Jason's Wiki as well. Actually, I had to read it a number of >>>> >> times to understand bits and pieces of it. I have to admit there is >>>> >> still >>>> >> some fuzziness about the whole things in my head - is "Ocean" something >>>> >> that >>>> >> already works, a separate project on googlecode.com? I think so. If >>>> >> so, >>>> >> and if you are working on getting it integrated into Lucene, would it >>>> >> make >>>> >> it less confusing to just refer to it as "real-time search", so there >>>> >> is no >>>> >> confusion? >>>> >> >>>> >> If this is to be initially integrated into Lucene, why are things like >>>> >> replication, crowding/field collapsing, locallucene, name service, tag >>>> >> index, etc. all mentioned there on the Wiki and bundled with >>>> >> description of >>>> >> how real-time search works and is to be implemented? I suppose >>>> >> mentioning >>>> >> replication kind-of makes sense because the replication approach is >>>> >> closely >>>> >> tied to real-time search - all query nodes need to see index changes >>>> >> fast. >>>> >> But Lucene itself offers no replication mechanism, so maybe the >>>> >> replication >>>> >> is something to figure out separately, say on the Solr level, later on >>>> >> "once >>>> >> we get there". I think even just the essential real-time search >>>> >> requires >>>> >> substantial changes to Lucene (I remember seeing large patches in >>>> >> JIRA), >>>> >> which makes it hard to digest, understand, comment on, and ultimately >>>> >> commit >>>> >> (hence the luke warm response, I think). Bringing other non-essential >>>> >> elements into discussion at the same time makes it more difficult t o >>>> >> process all this new stuff, at least for me. Am I the only one who >>>> >> finds >>>> >> this hard? >>>> >> >>>> >> That said, it sounds like we have some discussion going (Karl...), so I >>>> >> look forward to understanding more! :) >>>> >> >>>> >> >>>> >> Otis >>>> >> -- >>>> >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>>> >> >>>> >> >>>> >> >>>> >> ----- Original Message ---- >>>> >> > From: Yonik Seeley <[EMAIL PROTECTED]> >>>> >> > To: java-dev@lucene.apache.org >>>> >> > Sent: Thursday, September 4, 2008 10:13:32 AM >>>> >> > Subject: Re: Realtime Search for Social Networks Collaboration >>>> >> > >>>> >> > On Wed, Sep 3, 2008 at 6:50 PM, Jason Rutherglen >>>> >> > wrote: >>>> >> > > I also think it's got a >>>> >> > > lot of things now which makes integration difficult to do properly. >>>> >> > >>>> >> > I agree, and that's why the major bump in version number rather than >>>> >> > minor - we recognize that some features will need some amount of >>>> >> > rearchitecture. >>>> >> > >>>> >> > > I think the problem with integration with SOLR is it was designed >>>> >> > > with >>>> >> > > a different problem set in mind than Ocean, originally the CNET >>>> >> > > shopping application. >>>> >> > >>>> >> > That was the first use of Solr, but it actually existed before that >>>> >> > w/o any defined use other than to be a "plan B" alternative to MySQL >>>> >> > based search servers (that's actually where some of the parameter >>>> >> > names come from... the default /select URL instead of /search, the >>>> >> > "rows" parameter, etc). >>>> >> > >>>> >> > But you're right... some things like the replication strategy were >>>> >> > designed (well, borrowed from Doug to be exact) with the idea that it >>>> >> > would be OK to have slightly "stale" views of the data in the range >>>> >> > of >>>> >> > minutes. It just made things easier/possible at the time. But tons >>>> >> > of Solr and Lucene users want almost instantaneous visibility of >>>> >> > added >>>> >> > documents, if they can get it. It's hardly restricted to social >>>> >> > network applications. >>>> >> > >>>> >> > Bottom line is that Solr aims to be a general enterprise search >>>> >> > platform, and getting as real-time as we can get, and as scalable as >>>> >> > we can get are some of the top priorities going forward. >>>> >> > >>>> >> > -Yonik >>>> >> > >>>> >> > --------------------------------------------------------------------- >>>> >> > To unsubscribe, e-mail: [EMAIL PROTECTED] >>>> >> > For additional commands, e-mail: [EMAIL PROTECTED] >>>> >> >>>> >> >>>> >> --------------------------------------------------------------------- >>>> >> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>> >> For additional commands, e-mail: [EMAIL PROTECTED] >>>> >> >>>> > >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>> >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > > > -- > --Noble Paul > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >