The size of your index isn't a very useful number without knowing a
significant amount about the structure of your index. Depending upon what's
stored, what's indexed and what kind of searching you're doing (e.g.
sorting?) it varies. About all we can say is that you'll probably need less
than 100G. Here are a few rough ideas to try to pin this down:

1> turn off all storage and index a representative sample and extrapolate
the eventual size. This will be a rough measure of the amount of your index
relevant to searching. Do note that the size of the index will *not* grow
linearly, it'll grow very quickly for a while then the rate of increase will
decline.

2> As Ian said, think about turning of norms if you won't need them. See the
"Norms" section at
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

3> Try some searches and use one of the many measurement tools to see how
much memory is being used currently.


As far as how many queries/second, there's no possible way we can answer
that based on the information you've provided, you have to measure. You can
always copy the index to N machines and put a load balancer in front of them
for instance.

SOLR is always an option here as well.

Erick




On Wed, Dec 23, 2009 at 12:27 AM, Shakti Purohit <
shakti_puro...@persistent.co.in> wrote:

> We are required to find out how much percentage/part of lucene index needs
> to be in memory for acceptable search response time. The index size we have
> is around 100 GB while the available memory is 24 GB. Since we do not have
> the option of loading whole of the index in memory we wanted to know what
> minimum part of lucene index be loaded in memory so that response time is
> not affected.
> Does the index consist of any files or hierarchy such that loading only
> this file/information in memory and not whole of the index would suffice for
> faster response time.
>
> The other question I have is how many queries per second lucene can
> support? We are interested in finding out throughput of the system.
>
> Thanks,
> Shakti
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>

Reply via email to