Am 07.12.2012 15:12, schrieb Bissan Audeh:

> I'm doing some experiments with Lucene where I run many queries and I keep 
> top 1500  results of each query. I recently switched to Lucene4.0, but in all 
> cases I find that it takes a lot of time to get the REAL document id using 
> ScoreDoc and IndexSearcher especially that I have very large indexes.
> Does anyone know a faster way?
> It would be more efficient to have the document real name as an attribute of 
> the class ScoreDoc in addition to its luceneID and its score, because in all 
> cases this information is always needed to show retrieved documents.


By "real" name, do you mean something like the input document title as
opposed to the id assigned by Lucene during indexing? I've resolved this
by storing document name in a dedicated field so that I can use it in a
query or filter.
If you refer to the Lucene index ids, you might be interested in using a
Collector; the example "AllDocCollector" given in the textbook "Lucene
in Action" (McCandless, Hatcher, Gospodnetić, 2nd ed., ch. 6) is
probably helpful.
Best,
Carsten

-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

Reply via email to