If you are going to shard, and of course depending on the profile of the 
queries you expect to service, consider designing your shards around mail date. 
I read somewhere that for mailboxes 90% of the activity is within 10% of the 
mail items, with most recent mails being the 10%. This would be particularly 
attractive if you expect to service a considerable number of cross-user 
searches.

You might be able to use as little as a couple of "hot" indices and an archive 
index. 

Yours,
Moray
------------------------------------- 
Moray McConnachie
Head of IS        +44 1865 261 600
Oxford Analytica  http://www.oxan.com

-----Original Message-----
From: Pierre Henri Kuaté [mailto:[email protected]] 
Sent: 05 May 2009 13:29
To: [email protected]
Subject: RE: Designing an index with constant speed no matter how big

These are very useful suggestions; I will investigate all the tips in the wiki 
of Lucene.

Btw, when I said that "this doesn't work", referring to: OwnerId:123 AND 
MailContent:Something I meant that it was still very slow.

My application doesn't sort using Lucene and generally retrieves less than 100 
docs.

I think the most promising solution is sharding...

Thanks,
Pierre Henri.


--- On Sun, 5/3/09, Nitin Shiralkar <[email protected]> wrote:

From: Nitin Shiralkar <[email protected]>
Subject: RE: Designing an index with constant speed no matter how big
To: "[email protected]" 
<[email protected]>
Date: Sunday, May 3, 2009, 3:27 PM

Hi Pierre,

We have implemented out search engine in similar fashion and it is working 
absolutely fine. Few questions:

1. Do you sort on any field while searching? If yes, then remove that and check 
out.
2. How many results are retrieved while searching? If you are retrieving more 
than 100 documents, then use HitCollector method.


-----Original Message-----
From: Digy [mailto:[email protected]]
Sent: Sunday, May 03, 2009 3:57 PM
To: [email protected]
Subject: RE: Designing an index with constant speed no matter how big

Can it be related with your code? Since Lucene.Net can handle very large 
indeces easily.
Have you tried the search speed improvement techniques in 
http://wiki.apache.org/jakarta-lucene/ImproveSearchingSpeed


> My current implementation is to have a property OwnerId in each 
> document
and use it as a clause in the searches. Eg: OwnerId:123 AND 
MailContent:Something
> However, this doesn't work...

I don't understand why this didn't work.

DIGY

-----Original Message-----
From: Pierre Henri Kuaté [mailto:[email protected]]
Sent: Saturday, May 02, 2009 11:02 PM
To: [email protected]
Subject: Designing an index with constant speed no matter how big

Hi,

I am working on a project where full-text search gets slower as the number of 
(group of) documents increases.

Here is a simplified description of the project: It is an email system, so each 
user has its emails and can search for them using Lucene.net.
So logically, it should be possible to implement it so that its performance 
doesn't (really) drop as the number of users increases. The speed of a search 
should be based on the amount of documents that the logged user has.

My current implementation is to have a property OwnerId in each document and 
use it as a clause in the searches. Eg: OwnerId:123 AND MailContent:Something 
However, this doesn't work...

The extreme solution would be to completely dissociate each user's index.
But that would make my implementation harder to maintain.

Do you have any suggestions?

Pierre Henri.









Reply via email to