My Lucene index has about 3 million documents and result sets can be large, 
often 1000’s and sometimes as many as 100,000.  I am expecting the index size 
to grow 5-10x as the system matures.

I index 5 fields, and per recommendations I’ve read, am storing the minimal 
data in Lucene, currently just a 12 byte numeric identifier (a Mongo ObjectId) 
per document.  I store the rest of the data separately and use the id I get 
from Lucene to look it up there.

In my load testing, a search like this:

    TopDocs docs = indexSearcher.search(query, maxResults, sort)

takes about 50-75 msec which is good.  Retrieving documents with a loop like 
this:

    for(int i=0; i<docs.scoreDocs.length; i++) {
        ScoreDoc sdoc = docs.scoreDocs[i];
        String id = indexReader.document(sdoc.doc, 
Collections.singleton("pos_id”)).getField("pos_id").stringValue();
        // … retrieve data with id
    }

takes around 350-400 msec, sometimes as long as 800 msec.  I’m looking for ways 
to try to decrease this time if possible.

I’ve read up on DocValues and am not sure if that is intended to help with 
this.  I understand that it is a separate store/mapping of Lucene’s internal 
document id’s to my “pos_id” which sounds like it may help but I am not sure.  
I tried getting the id’s from my reader like this:

            String id = MultiDocValues.getBinaryValues(indexReader, 
"pos_id").get(sdoc.doc).utf8ToString()

But performance was no better.  However I saw in the docs for MultiDocValues 
that I may get better performance using the "atomic leaves and then operate 
per-LeafReader”. I searched around and could not find documentation on how to 
do that. I see some examples using leaf readers in the solr projects but they 
were just examples and I don’t think were written specifically to optimize 
performance.  It would be great to find an explanation of why there are 
multiple leaf readers per reader and how to use them.

So my questions are 1) are DocValues a possibility for improving my document 
retrieval performance, and 2) if so, where can I find an example of this that 
is written for best performance?

Thanks in advance!

Randy


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to