You could also use Hadoop RPC or ICE (www.zeroc.com) I'm on that path now.
--Venkat --- shai deljo <[EMAIL PROTECTED]> wrote: > I considered getting Lucene in action but figured > I'll wait for the > DVD to come out ;). > Seriously though, they write about RemoteSearchable > and use RMI, Is > this the recommended solution? does it scale well? > Thanks > > On 2/20/07, Otis Gospodnetic > <[EMAIL PROTECTED]> wrote: > > Well, there is also a Remote cousin there. That > will let you distribute your indices over N severs > (sounds like you'll need multiple). You should > really take a stroll through Lucene's javadoc, it's > incredibly nice now in winter time. Or ... clears > throat.... you could get a book ;) > > > > Otis > > . . . . . . . . . . . . . . . . . . . . . . . . . > . . . . . > > Simpy -- http://www.simpy.com/ - Tag - Search > - Share > > > > ----- Original Message ---- > > From: shai deljo <[EMAIL PROTECTED]> > > To: java-user@lucene.apache.org > > Sent: Tuesday, February 20, 2007 2:05:25 PM > > Subject: Re: Using Lucene - Design Question > > > > Hi, > > Thanks for the reply. > > * Regarding hardware I'll use something similar > to: Core 2 Duo - > > 2.66GHz, 2x300 GB disk drives, 4 GB RAM running on > one of the Linux > > distributions. > > * Regarding response time I'm looking to be ~300 > milliseconds for at > > least 80% of queries and ~500 milliseconds for 95% > of queries. > > * Will MultiSearcher (and it's parallel cosine :) > ) allow me to search > > indices cross multiple servers or is the > assumption is that all > > indices are on 1 server? > > Thanks > > > > > > On 2/20/07, Otis Gospodnetic > <[EMAIL PROTECTED]> wrote: > > > Hi Shi, > > > > > > Nobody will be able to give you the precise > answer, obviously. The best way is to try. > > > You didn't say what response time is desirable > nor what kind of hardware you will be using. > > > > > > I wouldn't bother with the Berkeley DB-backed > Lucene index for now, just use the regular one > (maybe use non-compound format). > > > If you need to partition your index, > MultiSearcher will help you search all your indices, > and its Parallel cousin will let you parallelize > those searches. > > > It sounds like rsync will work, but you'll have > to make sure that the segments file gets rsynced > last. > > > > > > Otis > > > > > > . . . . . . . . . . . . . . . . . . . . . . . . > . . . . . . > > > Simpy -- http://www.simpy.com/ - Tag - > Search - Share > > > > > > ----- Original Message ---- > > > From: shai deljo <[EMAIL PROTECTED]> > > > To: java-user@lucene.apache.org > > > Sent: Tuesday, February 20, 2007 5:51:13 AM > > > Subject: Using Lucene - Design Question > > > > > > Hi, > > > I have no experience with Lucene and I'm trying > to collect some > > > information in order to determine what solution > is best for me. > > > I need to index ~50M documents (starting with > 10M), the size of each > > > document is ~2k-~5k and I'll index a couple of > fields per document. I > > > expect ~20 queries per seconds and each query is > ~4 terms. Update rate > > > - not sure what is best and/or possible strategy > based on performance, > > > i.e. incremental indexing vs. pushing a full > index but as far as the > > > product is concerned most data can be updated > daily, the head (let's > > > say 20%) needs hourly (or at least on the order > of hours) update. > > > I also need to be able to override the > scoring/ranking and inject my > > > own logic and of course my main concern is > response time, especially > > > since i have additional computation on the hits > before returning the > > > results. > > > > > > BTW, for the additional ranking/computation i > will need to retrieve > > > values that are mapped by a term-field key, i.e. > i can't know the key > > > until i have the result and the query in my > hands. i figured i would > > > use Oracle Berkeley DB Java edition in order to > keep the calls as much > > > as possible in the memory -> any advise on this > as well ? > > > > > > For these requirements, do i need to worry about > partitioning the > > > Index? If i do partition it, is there a solution > to merge the results > > > back or do i need to do it on my own (does Solr > do it for me and if it > > > does, can i override the scoring there)? > > > AS far as serving multiple users, will a simple > rsync of the index > > > between multiple nodes running the same index (i > am not that sensitive > > > to data integrity) work or do i need to look at > something like > > > terracotta? > > > > > > In short, i am looking for the simplest > solution. > > > > > > Thanks in advance. > > > Shi > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > ____________________________________________________________________________________ Have a burning question? Go to www.Answers.yahoo.com and get answers from real people who know. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]