Re: A fast way to get real docID from large indexes?

Bissan AUDEH Wed, 12 Dec 2012 14:00:35 -0800

 Thank you Carsten,
What I mean by document real name is any stored field in the index that 
represents the document (ex:Document title, document file name in the file 
system, document location,...), or anything that you stored as a field at index 
time and you which to present to the user as  search result, because presenting 
the LuceneDocID means nothing to the user.


What I'm doing actually is something like this :

IndexSearcher searcher;
TopDocs results =  searcher.search(query, numTotalHits);
ScoreDoc[] hits = results.scoreDocs;
for (int  i = 0; i < numTotalHits; i++)
{
   doc = searcher.doc(hits[i].doc);
   System.out.println( hits[i].doc + " : " + hits[i].score);
}

unless I'm doing it wrong, the instruction "searcher.doc(hits[i].doc);" seems 
to be time consuming for large indexes.

I'll take a look at AllDocCollector that you mentioned in your mail hoping it 
will resolve my problem.

Le Mercredi 12 Décembre 2012 13:30 CET, Carsten Schnober 
<schno...@ids-mannheim.de> a écrit:

> Am 07.12.2012 15:12, schrieb Bissan Audeh:
>
> > I'm doing some experiments with Lucene where I run many queries and I keep 
> > top 1500  results of each query. I recently switched to Lucene4.0, but in 
> > all cases I find that it takes a lot of time to get the REAL document id 
> > using ScoreDoc and IndexSearcher especially that I have very large indexes.
> > Does anyone know a faster way?
> > It would be more efficient to have the document real name as an attribute 
> > of the class ScoreDoc in addition to its luceneID and its score, because in 
> > all cases this information is always needed to show retrieved documents.
>
>
> By "real" name, do you mean something like the input document title as
> opposed to the id assigned by Lucene during indexing? I've resolved this
> by storing document name in a dedicated field so that I can use it in a
> query or filter.
> If you refer to the Lucene index ids, you might be interested in using a
> Collector; the example "AllDocCollector" given in the textbook "Lucene
> in Action" (McCandless, Hatcher, Gospodnetić, 2nd ed., ch. 6) is
> probably helpful.
> Best,
> Carsten
>
> --
> Institut für Deutsche Sprache | http://www.ids-mannheim.de
> Projekt KorAP                 | http://korap.ids-mannheim.de
> Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
> Korpusanalyseplattform der nächsten Generation
> Next Generation Corpus Analysis Platform

Re: A fast way to get real docID from large indexes?

Reply via email to