Well, there is also a Remote cousin there. That will let you
distribute
your indices over N severs (sounds like you'll need multiple). You
should really take a stroll through Lucene's javadoc, it's
incredibly
nice now in winter time. Or ... clears throat.... you could get
a book
;)
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
----- Original Message ----
From: shai deljo <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, February 20, 2007 2:05:25 PM
Subject: Re: Using Lucene - Design Question
Hi,
Thanks for the reply.
* Regarding hardware I'll use something similar to: Core 2 Duo -
2.66GHz, 2x300 GB disk drives, 4 GB RAM running on one of the Linux
distributions.
* Regarding response time I'm looking to be ~300 milliseconds for at
least 80% of queries and ~500 milliseconds for 95% of queries.
* Will MultiSearcher (and it's parallel cosine :) ) allow me to
search
indices cross multiple servers or is the assumption is that all
indices are on 1 server?
Thanks
On 2/20/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Hi Shi,
Nobody will be able to give you the precise answer, obviously. The
best way is to try.
You didn't say what response time is desirable nor what kind of
hardware you will be using.
I wouldn't bother with the Berkeley DB-backed Lucene index for now,
just use the regular one (maybe use non-compound format).
If you need to partition your index, MultiSearcher will help you
search
all your indices, and its Parallel cousin will let you
parallelize those
searches.
It sounds like rsync will work, but you'll have to make sure
that the
segments file gets rsynced last.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
----- Original Message ----
From: shai deljo <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, February 20, 2007 5:51:13 AM
Subject: Using Lucene - Design Question
Hi,
I have no experience with Lucene and I'm trying to collect some
information in order to determine what solution is best for me.
I need to index ~50M documents (starting with 10M), the size of
each
document is ~2k-~5k and I'll index a couple of fields per
document. I
expect ~20 queries per seconds and each query is ~4 terms.
Update rate
- not sure what is best and/or possible strategy based on
performance,
i.e. incremental indexing vs. pushing a full index but as far as
the
product is concerned most data can be updated daily, the head
(let's
say 20%) needs hourly (or at least on the order of hours) update.
I also need to be able to override the scoring/ranking and
inject my
own logic and of course my main concern is response time,
especially
since i have additional computation on the hits before returning
the
results.
BTW, for the additional ranking/computation i will need to retrieve
values that are mapped by a term-field key, i.e. i can't know
the key
until i have the result and the query in my hands. i figured i
would
use Oracle Berkeley DB Java edition in order to keep the calls
as much
as possible in the memory -> any advise on this as well ?
For these requirements, do i need to worry about partitioning the
Index? If i do partition it, is there a solution to merge the
results
back or do i need to do it on my own (does Solr do it for me and
if it
does, can i override the scoring there)?
AS far as serving multiple users, will a simple rsync of the index
between multiple nodes running the same index (i am not that
sensitive
to data integrity) work or do i need to look at something like
terracotta?
In short, i am looking for the simplest solution.
Thanks in advance.
Shi
-------------------------------------------------------------------
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-------------------------------------------------------------------
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]