Re: Using separate index for each user

Erick Erickson Thu, 18 Sep 2008 06:35:41 -0700

uuuhhhh, take anything Otis says as *much* more informed than anything I say
on this topic <G>.


Erick

On Thu, Sep 18, 2008 at 2:32 AM, Tobias Larsson Hult <
[EMAIL PROTECTED]> wrote:

> Thanks for the quick responses!
>
> Good point about the warmup issues Erick, that's something we will
> consider. Good to know that this kind of setup has been proved working for
> at least one :) I think we will do a small setup and test the performance.
>
> Thanks again for valuable input!
>
> Best Regards
> Tobias
>
>
> On 16 sep 2008, at 18.10, Otis Gospodnetic wrote:
>
>  Tobias,
>>
>> That's the approach I took with Simpy.com and it's been working well for
>> several years now.  You'll have to keep track of searchers and close them
>> when appropriate, of course.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
> On 16 sep 2008, at 17.17, Erick Erickson wrote:
>
>>
>> The main arguments against using many separate indexes are
>> 1> search warmup time. That is, each time you open an index
>>    the first few queries take much longer than subsequent searches.
>> 2> Managing a bazillion indexes is non-trivial.
>>
>>
>> That said, in your particular case these may not apply. I guess the
>> piece of information that really counts is "how often do you expect
>> to update/search a given index"? You could avoid the warmup issue
>> by keeping an index open for some period of time after the first
>> search on the assumption that the user is going to make multiple
>> searches rather than just one. I'm sure there are other tricks
>> you can try.
>>
>> So, how often do you expect
>> 1> users to backup date
>> 2> users to query data?
>> and what is acceptable search response time? and are your
>> users willing to live with a significant delay on the first couple
>> of queries?
>>
>> I'd only be comfortable with choosing an approach if I tried
>> it out with a single computer's content and generated a few
>> stats....
>>
>> Best
>> Erick
>>
>> ----- Original Message ----
>>
>>> From: Tobias Larsson Hult <[EMAIL PROTECTED]>
>>> To: java-user@lucene.apache.org
>>> Sent: Tuesday, September 16, 2008 10:55:09 AM
>>> Subject: Using separate index for each user
>>>
>>> Hi,
>>>
>>> We're thinking of using Lucene to integrate search in a backup service
>>> application. The background is that we have a bunch of users using a
>>> backup service, and we want them to be able to search their own, and
>>> only their own, backups.
>>>
>>> The total amount of data that's being backed up is very large (size in
>>> terabyte). Even though the index will probably be smaller due to only
>>> indexing relevant fields, it is still to much to incorporate in one
>>> index. But since a user will only search in his/her own files we're
>>> thinking of creating one index for each user. There will be a lot of
>>> indexes of course but each index will not span to more than a couple
>>> of gigabytes at the most.
>>>
>>> So when a user searches or adds new content to the backup we will open
>>> up his/her index and to a search/update in that particular index. That
>>> way, each query/update should not be so performance intense.
>>>
>>> Does this sound like a reasonable solution?  Of course this means
>>> creating a lot of IndexReaders/Writers but I prefer that to searching
>>> in a huge index everytime when a user only wants to search in a slice
>>> of the total index.
>>>
>>> Best Regards,
>>> Tobias Larsson Hult
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Using separate index for each user

Reply via email to