I have a project for which I want to characterize Lucene query performance on different size archives of my XML files. I have created archives and indices of 1000, 2000, 4000, 8000, and 16000 XML files (average file size about 10K) generated from my DTD and containing mostly random string content in the simple elements. I run multiple tests with different random content in each in the archive, timing each of three diffenent queries:
query 1: Field1:stringA query 2: Field1:stringA Field2:stringB query 3: Field1:stringA AND Field2:stringB
the time to complete query 1 increases with archive size, but the subsequent query 2 and query 3 times are ALL about the same (generally less than 1 sec, on a Sun Ultra 60 with 2 450 MHz processors & 512 MB memory, running Solaris 9, Java 1.4, Lucene 1.2) regardless of archive size.
I expected the time to complete query 2 and 3 to also increase with archive size, but as I said it remained constant. What is Lucene doing (caching?) to make this happen?
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
