Dear list, I am working on a search application that depends on retrieving offsets for each match. Currently (in Lucene 3.6), this seems to be overly costly, at least in my solution that looks like this:
----------------------------------------------------------------------- TermPositionVector tfv; int index; TermVectorOffsetInfo[] offsets; tfv = (TermPositionVector) reader.getTermFreqVector(docid, fieldname); index = tfv.indexOf(term.text()); offsets = tfv.getOffsets(index); ----------------------------------------------------------------------- So I can user the suitable TermVectorOffsetInfo from the offsets[] array to retrieve the offset information of a span. However, this slows down the search to an unacceptable level. Reviewing the thread 'Retrieving Offsets' (http://lucene.472066.n3.nabble.com/Retrieving-offsets-td3658238.html) indicates that there has not been any more efficient way to go in Lucene 3.6. Am I right? However, I understand that the patch LUCENE-3684 (https://issues.apache.org/jira/browse/LUCENE-3684) has improved the situation. I am wondering now whether this is worth migrating to Lucene 4.0 in terms of search performance. It is currently not entirely clear to me, whether Lucene 4.0 alpha actually allows the retrieval of offsets from an index without having to read the TermFreqVector though. Who can give me some advise about the potential search performance gain for such an application and ideally to some pointers about how to resolve the problem? Thank you very much, Carsten Schnober -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org