Thanks for the quick responses!
Good point about the warmup issues Erick, that's something we will
consider. Good to know that this kind of setup has been proved working
for at least one :) I think we will do a small setup and test the
performance.
Thanks again for valuable input!
Best Regards
Tobias
On 16 sep 2008, at 18.10, Otis Gospodnetic wrote:
Tobias,
That's the approach I took with Simpy.com and it's been working well
for several years now. You'll have to keep track of searchers and
close them when appropriate, of course.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
On 16 sep 2008, at 17.17, Erick Erickson wrote:
The main arguments against using many separate indexes are
1> search warmup time. That is, each time you open an index
the first few queries take much longer than subsequent searches.
2> Managing a bazillion indexes is non-trivial.
That said, in your particular case these may not apply. I guess the
piece of information that really counts is "how often do you expect
to update/search a given index"? You could avoid the warmup issue
by keeping an index open for some period of time after the first
search on the assumption that the user is going to make multiple
searches rather than just one. I'm sure there are other tricks
you can try.
So, how often do you expect
1> users to backup date
2> users to query data?
and what is acceptable search response time? and are your
users willing to live with a significant delay on the first couple
of queries?
I'd only be comfortable with choosing an approach if I tried
it out with a single computer's content and generated a few
stats....
Best
Erick
----- Original Message ----
From: Tobias Larsson Hult <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, September 16, 2008 10:55:09 AM
Subject: Using separate index for each user
Hi,
We're thinking of using Lucene to integrate search in a backup
service
application. The background is that we have a bunch of users using a
backup service, and we want them to be able to search their own, and
only their own, backups.
The total amount of data that's being backed up is very large (size
in
terabyte). Even though the index will probably be smaller due to only
indexing relevant fields, it is still to much to incorporate in one
index. But since a user will only search in his/her own files we're
thinking of creating one index for each user. There will be a lot of
indexes of course but each index will not span to more than a couple
of gigabytes at the most.
So when a user searches or adds new content to the backup we will
open
up his/her index and to a search/update in that particular index.
That
way, each query/update should not be so performance intense.
Does this sound like a reasonable solution? Of course this means
creating a lot of IndexReaders/Writers but I prefer that to searching
in a huge index everytime when a user only wants to search in a slice
of the total index.
Best Regards,
Tobias Larsson Hult
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]