DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=18927>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=18927 [PATCH] Term Vector support ------- Additional Comments From [EMAIL PROTECTED] 2004-08-19 12:10 ------- Term Vector support now has optional support for storing Token.getPositionIncrement() and Token.startOffset() and Token.endOffset() information. Control of this is done through the standard Field creation methods. All options are backward compatible (position and offset information will _not_ be stored by default). Added many new test cases to demonstrate functionality. There are two new files needed: SegmentTermPositionVector and TermVectorOffsetInfo. All tests pass as of 8/19/04 in the AM. Attached should be 1 patch file plus a zip containing 2 new files. What is this info good for? 1. I think the highlighter could use this info (offset) instead of reparsing every document at runtime 2. Many IR algorithms need character position, etc. 3. Others?? Remember, the values stored are based on what values you set when running the Analyzer (i.e. Token.startOffset and Token.endOffset and Token.positionIncrement). These values are controlled by the application author and can vary by application. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]