DO NOT REPLY [Bug 18927] - [PATCH] Term Vector support

bugzilla Thu, 19 Aug 2004 05:08:35 -0700

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=18927>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://issues.apache.org/bugzilla/show_bug.cgi?id=18927

[PATCH] Term Vector support





------- Additional Comments From [EMAIL PROTECTED]  2004-08-19 12:10 -------
Term Vector support now has optional support for storing 
Token.getPositionIncrement() and Token.startOffset() and Token.endOffset() 
information.  Control of this is done through the standard Field creation 
methods.  All options are backward compatible (position and offset information 
will _not_ be stored by default).  Added many new test cases to demonstrate 
functionality.  There are two new files needed: SegmentTermPositionVector and 
TermVectorOffsetInfo.  All tests pass as of 8/19/04 in the AM.

Attached should be 1 patch file plus a zip containing 2 new files.

What is this info good for?
1.  I think the highlighter could use this info (offset) instead of reparsing 
every document at runtime
2. Many IR algorithms need character position, etc.
3. Others??

Remember, the values stored are based on what values you set when running the 
Analyzer (i.e. Token.startOffset and Token.endOffset and 
Token.positionIncrement).  These values are controlled by the application 
author and can vary by application.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 18927] - [PATCH] Term Vector support

Reply via email to