I have a project for which I want to characterize Lucene query performance
on different size archives of my XML files.  I have created archives
and indices of 1000, 2000, 4000, 8000, and 16000 XML files (average
file size about 10K) generated from
my DTD and containing mostly random string content in the simple
elements.  I run multiple tests with different random content in
each in the archive, timing each of three diffenent queries:

  query 1: Field1:stringA
  query 2: Field1:stringA Field2:stringB
  query 3: Field1:stringA AND Field2:stringB

the time to complete query 1 increases with archive size, but the
subsequent query 2 and query 3 times are ALL about the same
(generally less than 1 sec, on a Sun Ultra 60 with 2 450 MHz
processors & 512 MB memory, running Solaris 9, Java 1.4,
Lucene 1.2) regardless of archive size.

I expected the time to complete query 2 and 3 to also increase
with archive size, but as I said it remained constant.  What
is Lucene doing (caching?) to make this happen?



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to