Lucene (using 3.5) seems to be caching field values for documents (after they have been retrieved) and I am hoping someone can provide more information on how and where exactly the field values are stored.
The table below lists the times (in milliseconds) associated with retrieving for a set of documents matching a particular query a single stored value from each document in the set. Results are shown for three queries (A, B, and C) submitted multiple times. The first time each query is submitted, the time to retrieve it's matching document values is considerably longer than any time after that. 1) search A nDocs = 489 time = 1342 2) search A nDocs = 489 time = 811 3) search B nDocs = 47038 time = 76658 4) search B nDocs = 47038 time = 1062 5) search C nDocs = 5256 time = 22741 6) search C nDocs = 5256 time = 578 7) search A nDocs = 489 time = 515 8) search A nDocs = 489 time = 514 9) search B nDocs = 47038 time = 1000 10) search B nDocs = 47038 time = 967 11) search C nDocs = 5256 time = 563 12) search C nDocs = 5256 time = 562 Whatever information that is being cached is available across separate processes so presumably it is residing somewhere in the file system (and/or virtual memory). I have also seen the same behavior when retrieving TermFreqVector information as well. Any additional insight is appreciated! Thanks, Stuart __________________________________________________ Stuart Rose Senior Research Engineer Pacific Northwest National Laboratory