Hi,

I was browsing the term highlighting code in the sandbox and I noticed
the following comment for the getBestFragment method in the
Highlighter.java code:

        /**
...
         * @param tokenStream   a stream of tokens identified in the
text parameter, including offset information. 
         * This is typically produced by an analyzer re-parsing a
document's 
         * text. Some work may be done on retrieving TokenStreams more
efficently 
         * by adding support for storing original text position data in
the Lucene
         * index but this support is not currently available (as of
Lucene 1.4 rc2).  
...
         */

which struck me that I might be able to contribute some more time to
make this so, since I recently submitted a patch to offer just such an
enhancement to the term vector.

I would like to implement this, but I don't really want to submit a
patch against another patch (It's hard enough managing all the changes
that come down).  So, I was wondering if anyone (i.e. a committer) has
had a chance to look at the Term Vector offset patch and what their
thoughts are on it?  I can see the performance improvements in the
highlighter that would come about by avoiding having to re-analyze the
text, plus you could highlight the whole field if you wanted to.

Also, if I make this change, do the committers suggest I keep the
current ability to analyze and have this as an alternative, or would it
be safe to assume this is only used when offset info is stored?

Thanks,
Grant


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to