Hi, I was browsing the term highlighting code in the sandbox and I noticed the following comment for the getBestFragment method in the Highlighter.java code:
/** ... * @param tokenStream a stream of tokens identified in the text parameter, including offset information. * This is typically produced by an analyzer re-parsing a document's * text. Some work may be done on retrieving TokenStreams more efficently * by adding support for storing original text position data in the Lucene * index but this support is not currently available (as of Lucene 1.4 rc2). ... */ which struck me that I might be able to contribute some more time to make this so, since I recently submitted a patch to offer just such an enhancement to the term vector. I would like to implement this, but I don't really want to submit a patch against another patch (It's hard enough managing all the changes that come down). So, I was wondering if anyone (i.e. a committer) has had a chance to look at the Term Vector offset patch and what their thoughts are on it? I can see the performance improvements in the highlighter that would come about by avoiding having to re-analyze the text, plus you could highlight the whole field if you wanted to. Also, if I make this change, do the committers suggest I keep the current ability to analyze and have this as an alternative, or would it be safe to assume this is only used when offset info is stored? Thanks, Grant --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]