Re: Dmitry's Term Vector stuff, plus some

Grant Ingersoll Tue, 24 Feb 2004 15:07:16 -0800

This is provided by the Token.startOffset() and Token.endOffset() at indexing time, I 
think.

I don't know if this is accessible at run time.  A good place to see what is stored in 
the files is the File Formats section located at 
http://jakarta.apache.org/lucene/docs/fileformats.html.  (Get the latest from HEAD to 
see the new Term Vector stuff).  For what you can access, I usually start at 
IndexReader and dig in from there.

Of course, the Position info and how we did it is available in the "first" patch I 
submitted (and the "original one" from Dmitry), so if you are willing to always write 
position information, you could update your code with  that information.  Or, better 
yet :-), take it and add the necessary touches to make it truly optional and donate it 
back to Lucene.

-Grant

>>> [EMAIL PROTECTED] 02/24/04 05:39PM >>>
Grant Ingersoll wrote:

> It is the location of the token in the document (see IndexReader.termPositions()).  
> This information is already being stored in other parts of the index, it just isn't 
> very efficient to get at it.  

Ok, that wasn't the answer I was hoping for :) I was hoping that the positions 
referred to was the 
start/end offsets in the originating Token(s). I'll just have to find another way to 
optimize the 
highlighting code to make it more efficient.

Regards,

Bruce Ritchie
http://www.jivesoftware.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Dmitry's Term Vector stuff, plus some

Reply via email to