RE: [Lucene-users] Performance/Scalability Benchmarks for Lucene

Cory L Hubert Mon, 11 Jun 2001 15:19:44 -0700
        I think we need to integrate JUnit into Lucene.   There are JUnit
components that can do metrics.   That would give us solid answers to our
performance, scalability questions.

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of W. Eliot
Kimber
Sent: Monday, June 11, 2001 12:16 PM
To: Tal Dayan
Cc: [EMAIL PROTECTED]
Subject: Re: [Lucene-users] Performance/Scalability Benchmarks for
Lucene


Tal Dayan wrote:
>
> Hi Eliot,
>
> Are all the 10,000 doing seaches all day ? Can you estimate
> the required peak number searches per second ?

I would think something like 1000/second would be about right--that is,
at any moment, 10% of the 10,000 active users would be requesting a
search. This reflects a use case in which searching is one of the
primary services the system provides and is one of the primary means of
finding things in the system.

> What about hardware, is a multi server solution practical ? What
> kind of hardware do you have in mind ?

I think we are anticipating the usual beefy hardware you need to drive a
system of this scale in any case--big SUN machines, high-speed storage,
etc. That is, I think we can presume fastest possible hardware (which
would otherwise be a requirement for delivering the overall system
performance, not just the indexing).

> What is the expected total size of your data ? How often does it
> changed or need to be reindexed ?

In the large-scale use case, most documents would be in the 50-100K
range (that is, typical business documents), but there would be 100s of
thousands or millions of documents to be indexed. I'm not sure what our
expected rate for adding new docs is, but I would think that 100/hour
would be about right. Each new document would require re-indexing. In
the non-versioned case, existing (and previously-indexed docs) would be
replaced by new copies, which would have to be re-indexed. This would
probably account for maybe 10 docs an hour.

In the versioned content management use case, existing versions are
never deleted and their indexes persist indefinitely, so indexing would
always be additive, with no need to invalidate existing indexes because
documents had been deleted.

This is about as specific as I can be--I'm mostly wondering if either
people have used Lucene at these sorts of scales (or something
close--our scale targets are pretty high, reflecting the needs of the
largest enterprises) or if there are some existing scalability tests
that we can run on our test bed to get some baseline numbers.

Thanks,

Eliot
--
. . . . . . . . . . . . . . . . . . . . . . . .

W. Eliot Kimber | Lead Brain

1016 La Posada Dr. | Suite 240 | Austin TX  78752
    T 512.656.4139 |  F 512.419.1860 | [EMAIL PROTECTED]

w w w . d a t a c h a n n e l . c o m

_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/lucene-users


_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/lucene-users
RE: [Lucene-users] Performance/Scalability Benchmarks for Lucene

Reply via email to