Re: Using Lucene - Design Question

Venkat Seeth Tue, 20 Feb 2007 21:22:52 -0800

You could also use Hadoop RPC or ICE (www.zeroc.com)

I'm on that path now.


--Venkat

--- shai deljo <[EMAIL PROTECTED]> wrote:

> I considered getting  Lucene in action but figured
> I'll wait for the
> DVD to come out ;).
> Seriously though, they write about RemoteSearchable
> and use RMI, Is
> this the recommended solution? does it scale well?
> Thanks
> 
> On 2/20/07, Otis Gospodnetic
> <[EMAIL PROTECTED]> wrote:
> > Well, there is also a Remote cousin there.  That
> will let you distribute your indices over N severs
> (sounds like you'll need multiple).  You should
> really take a stroll through Lucene's javadoc, it's
> incredibly nice now in winter time.  Or ... clears
> throat.... you could get a book ;)
> >
> > Otis
> >  . . . . . . . . . . . . . . . . . . . . . . . . .
> . . . . .
> > Simpy -- http://www.simpy.com/  -  Tag  -  Search 
> -  Share
> >
> > ----- Original Message ----
> > From: shai deljo <[EMAIL PROTECTED]>
> > To: java-user@lucene.apache.org
> > Sent: Tuesday, February 20, 2007 2:05:25 PM
> > Subject: Re: Using Lucene - Design Question
> >
> > Hi,
> > Thanks for the reply.
> > * Regarding hardware I'll use something similar
> to: Core 2 Duo -
> > 2.66GHz, 2x300 GB disk drives, 4 GB RAM running on
> one of the Linux
> > distributions.
> > * Regarding response time I'm looking to be ~300
> milliseconds for at
> > least 80% of queries and ~500 milliseconds for 95%
> of queries.
> > * Will MultiSearcher (and it's parallel cosine :)
> ) allow me to search
> > indices cross multiple servers or is the
> assumption is that all
> > indices are on 1 server?
> > Thanks
> >
> >
> > On 2/20/07, Otis Gospodnetic
> <[EMAIL PROTECTED]> wrote:
> > > Hi Shi,
> > >
> > > Nobody will be able to give you the precise
> answer, obviously.  The best way is to try.
> > > You didn't say what response time is desirable
> nor what kind of hardware you will be using.
> > >
> > > I wouldn't bother with the Berkeley DB-backed
> Lucene index for now, just use the regular one
> (maybe use non-compound format).
> > > If you need to partition your index,
> MultiSearcher will help you search all your indices,
> and its Parallel cousin will let you parallelize
> those searches.
> > > It sounds like rsync will work, but you'll have
> to make sure that the segments file gets rsynced
> last.
> > >
> > > Otis
> > >
> > > . . . . . . . . . . . . . . . . . . . . . . . .
> . . . . . .
> > > Simpy -- http://www.simpy.com/  -  Tag  - 
> Search  -  Share
> > >
> > > ----- Original Message ----
> > > From: shai deljo <[EMAIL PROTECTED]>
> > > To: java-user@lucene.apache.org
> > > Sent: Tuesday, February 20, 2007 5:51:13 AM
> > > Subject: Using Lucene - Design Question
> > >
> > > Hi,
> > > I have no experience with Lucene and I'm trying
> to collect some
> > > information in order to determine what solution
> is best for me.
> > > I need to index ~50M documents (starting with
> 10M), the size of each
> > > document is ~2k-~5k and I'll index a couple of
> fields per document. I
> > > expect ~20 queries per seconds and each query is
> ~4 terms. Update rate
> > > - not sure what is best and/or possible strategy
> based on performance,
> > > i.e. incremental indexing vs. pushing a full
> index but as far as the
> > > product is concerned most data can be updated
> daily, the head (let's
> > > say 20%) needs hourly (or at least on the order
> of hours) update.
> > > I also need to be able to override the
> scoring/ranking and inject my
> > > own logic and of course  my main concern is
> response time, especially
> > > since i have additional computation on the hits
> before returning the
> > > results.
> > >
> > > BTW, for the additional ranking/computation i
> will need to retrieve
> > > values that are mapped by a term-field key, i.e.
> i can't know the key
> > > until i have the result and the query in my
> hands. i figured i would
> > > use Oracle Berkeley DB Java edition in order to
> keep the calls as much
> > > as possible in the memory -> any advise on this
> as well ?
> > >
> > > For these requirements, do i need to worry about
> partitioning the
> > > Index? If i do partition it, is there a solution
> to merge the results
> > > back or do i need to do it on my own (does Solr
> do it for me and if it
> > > does, can i override the scoring there)?
> > > AS far as serving multiple users, will a simple
> rsync of the index
> > > between multiple nodes running the same index (i
> am not that sensitive
> > > to data integrity) work or do i need to look at
> something like
> > > terracotta?
> > >
> > > In short, i am looking for the simplest
> solution.
> > >
> > > Thanks in advance.
> > > Shi
> > >
> > >
>
---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > >
> > >
> > >
> > >
> > >
> > >
>
---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > >
> > >
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> >
> >
> >
> >
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> >
> >
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 



 
____________________________________________________________________________________
Have a burning question?  
Go to www.Answers.yahoo.com and get answers from real people who know.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using Lucene - Design Question

Reply via email to