I considered getting Lucene in action but figured I'll wait for the DVD to come out ;). Seriously though, they write about RemoteSearchable and use RMI, Is this the recommended solution? does it scale well? Thanks
On 2/20/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Well, there is also a Remote cousin there. That will let you distribute your indices over N severs (sounds like you'll need multiple). You should really take a stroll through Lucene's javadoc, it's incredibly nice now in winter time. Or ... clears throat.... you could get a book ;) Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share ----- Original Message ---- From: shai deljo <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, February 20, 2007 2:05:25 PM Subject: Re: Using Lucene - Design Question Hi, Thanks for the reply. * Regarding hardware I'll use something similar to: Core 2 Duo - 2.66GHz, 2x300 GB disk drives, 4 GB RAM running on one of the Linux distributions. * Regarding response time I'm looking to be ~300 milliseconds for at least 80% of queries and ~500 milliseconds for 95% of queries. * Will MultiSearcher (and it's parallel cosine :) ) allow me to search indices cross multiple servers or is the assumption is that all indices are on 1 server? Thanks On 2/20/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Hi Shi, > > Nobody will be able to give you the precise answer, obviously. The best way is to try. > You didn't say what response time is desirable nor what kind of hardware you will be using. > > I wouldn't bother with the Berkeley DB-backed Lucene index for now, just use the regular one (maybe use non-compound format). > If you need to partition your index, MultiSearcher will help you search all your indices, and its Parallel cousin will let you parallelize those searches. > It sounds like rsync will work, but you'll have to make sure that the segments file gets rsynced last. > > Otis > > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . > Simpy -- http://www.simpy.com/ - Tag - Search - Share > > ----- Original Message ---- > From: shai deljo <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Tuesday, February 20, 2007 5:51:13 AM > Subject: Using Lucene - Design Question > > Hi, > I have no experience with Lucene and I'm trying to collect some > information in order to determine what solution is best for me. > I need to index ~50M documents (starting with 10M), the size of each > document is ~2k-~5k and I'll index a couple of fields per document. I > expect ~20 queries per seconds and each query is ~4 terms. Update rate > - not sure what is best and/or possible strategy based on performance, > i.e. incremental indexing vs. pushing a full index but as far as the > product is concerned most data can be updated daily, the head (let's > say 20%) needs hourly (or at least on the order of hours) update. > I also need to be able to override the scoring/ranking and inject my > own logic and of course my main concern is response time, especially > since i have additional computation on the hits before returning the > results. > > BTW, for the additional ranking/computation i will need to retrieve > values that are mapped by a term-field key, i.e. i can't know the key > until i have the result and the query in my hands. i figured i would > use Oracle Berkeley DB Java edition in order to keep the calls as much > as possible in the memory -> any advise on this as well ? > > For these requirements, do i need to worry about partitioning the > Index? If i do partition it, is there a solution to merge the results > back or do i need to do it on my own (does Solr do it for me and if it > does, can i override the scoring there)? > AS far as serving multiple users, will a simple rsync of the index > between multiple nodes running the same index (i am not that sensitive > to data integrity) work or do i need to look at something like > terracotta? > > In short, i am looking for the simplest solution. > > Thanks in advance. > Shi > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]