RE: Alternative to looping through Hits

Franklin Simmons Fri, 02 Oct 2009 12:57:03 -0700

Trevor,

Is your index optimized? How many documents are in your index? Is the OCR field 
stored, and if so, is the "DocumentId" field always the first field in the 
document? My impression is that can make a difference, however I can't recall 
specific discussions regarding that conjecture....You might consider adding 
finely grained time span measurements to identify the bottleneck.


-----Original Message-----
From: Trevor Watson [mailto:[email protected]] 
Sent: Friday, October 02, 2009 2:23 PM
To: [email protected]
Subject: Re: Alternative to looping through Hits

Thank you so far for the help with this.  I've been trying the different 
suggestions that you all posted on here. 

The Lucene index contains a numeric index (the value I want), 4 text 
fields (With simple data (i.e. Form, Publication, Email)) or people's 
names) and 1 text field with the OCR of the image that is referenced by 
the line (very large in some cases).  The data is currently stored for 
the text fields (for testing), and hopefully making that only tokenized 
and not save the actual info will speed some things up.

The following is a list of the times the loops are taking.

Any advice on speeding any of them up to better time?

Thanks in advance.

End Fieldable:22 seconds
--------------start code--------------
            Lucene.Net.Search.TopFieldDocCollector collector = new 
Lucene.Net.Search.TopFieldDocCollector(reader, 
Lucene.Net.Search.Sort.RELEVANCE, 100000);
            search.Search(query, null, collector);
            Lucene.Net.Search.TopDocs topDocs = collector.TopDocs();
            string[] values = new string[topDocs.scoreDocs.Length];
            LuceneUtilities.MyFieldSelector field_selector = new 
LuceneUtilities.MyFieldSelector("DocumentId");
            for(int i = 0; i < values.Length; i++)
            {
                Lucene.Net.Search.ScoreDoc score_document = 
topDocs.scoreDocs[i];
                Lucene.Net.Documents.Document document = 
search.Doc(score_document.doc, field_selector);
                values[i] = 
document.GetFieldable("DocumentId").StringValue();
            }

            string csv = String.Join(" ,",values);
--------------end code--------------


End TopDocs, plus string:30 seconds
--------------start code--------------
            string docIds = "";  
            totalDocuments = hits.Length();

            dtStart = DateTime.Now;
            docIds = "";
            //Lucene.Net.Search.TopDocs topDocs = search.Search(query, 
null, 100000);
            topDocs = search.Search(query, null, 100000);

            foreach (Lucene.Net.Search.ScoreDoc sd in topDocs.scoreDocs)
            {
                Lucene.Net.Documents.Document docTest = search.Doc(sd.doc);
                docIds += docTest.GetField("DocumentId").StringValue() + 
", ";
            }
            dtCurrent = DateTime.Now;
--------------end code--------------

End HitIterator (string array):29 seconds
--------------start code--------------
            Lucene.Net.Search.HitIterator hi = 
(Lucene.Net.Search.HitIterator)hits.Iterator();
            string[] sTest1 = new string[hits.Length()];
            int iCount1 = 0;

            dtStart = DateTime.Now;
            while (hi.MoveNext())
            {
                sTest1[iCount1] = 
((Lucene.Net.Search.Hit)hi.Current).GetDocument().GetField("DocumentId").StringValue();
                iCount1++;
                //docIds += 
((Lucene.Net.Search.Hit)hi.Current).GetDocument().GetField("DocumentId").StringValue()
 
+ ", ";
            }
--------------end code--------------

End HitIterator (arrayList):30 seconds
--------------start code--------------
hi = (Lucene.Net.Search.HitIterator)hits.Iterator();
            StringBuilder sb = new StringBuilder();
            ArrayList alTest = new ArrayList();

            dtStart = DateTime.Now;
            while (hi.MoveNext())
                
alTest.Add(((Lucene.Net.Search.Hit)hi.Current).GetDocument().GetField("DocumentId").StringValue());

--------------end code--------------

End Hits (array):30 seconds
--------------start code--------------
        string[] sFinalDocs = new string[totalDocuments];
            for (int iCount = 0; iCount < totalDocuments; iCount++)
            {
                Lucene.Net.Documents.Document docHit = hits.Doc(iCount);
                //docIds += docHit.GetField("DocumentId").StringValue() 
+ ", ";
                sFinalDocs[iCount] = 
docHit.GetField("DocumentId").StringValue();
            }
            docIds = String.Join(", ", sFinalDocs);--------------end 
code--------------

RE: Alternative to looping through Hits

Reply via email to