Offsets in 3.6/4.0

Carsten Schnober Fri, 13 Jul 2012 05:30:45 -0700

Dear list,
I am working on a search application that depends on retrieving offsets
for each match. Currently (in Lucene 3.6), this seems to be overly
costly, at least in my solution that looks like this:


-----------------------------------------------------------------------
TermPositionVector tfv;
int index;
TermVectorOffsetInfo[] offsets;

tfv = (TermPositionVector) reader.getTermFreqVector(docid, fieldname);
index = tfv.indexOf(term.text());
offsets = tfv.getOffsets(index);
-----------------------------------------------------------------------
So I can user the suitable TermVectorOffsetInfo from the offsets[] array
to retrieve the offset information of a span. However, this slows down
the search to an unacceptable level.

Reviewing the thread 'Retrieving Offsets'
(http://lucene.472066.n3.nabble.com/Retrieving-offsets-td3658238.html)
indicates that there has not been any more efficient way to go in Lucene
3.6. Am I right?

However, I understand that the patch LUCENE-3684
(https://issues.apache.org/jira/browse/LUCENE-3684) has improved the
situation. I am wondering now whether this is worth migrating to Lucene
4.0 in terms of search performance. It is currently not entirely clear
to me, whether Lucene 4.0 alpha actually allows the retrieval of offsets
from an index without having to read the TermFreqVector though.

Who can give me some advise about the potential search performance gain
for such an application and ideally to some pointers about how to
resolve the problem?

Thank you very much,
Carsten Schnober


-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Offsets in 3.6/4.0

Reply via email to