DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=18927>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=18927 [PATCH] Term Vector support [EMAIL PROTECTED] changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Store |Website ------- Additional Comments From [EMAIL PROTECTED] 2004-02-09 16:16 ------- Below is the diff produced on the File Formats XML file located in xdocs, as promised. I trust it will be checked for accuracy. Let me know if there are any mistakes and I will fix them. cvs diff -Nu fileformats.xml Index: fileformats.xml =================================================================== RCS file: /home/cvspublic/jakarta-lucene/xdocs/fileformats.xml,v retrieving revision 1.6 diff -u -r1.6 fileformats.xml --- fileformats.xml 13 Oct 2003 13:53:08 -0000 1.6 +++ fileformats.xml 9 Feb 2004 16:08:57 -0000 @@ -224,7 +224,11 @@ multiplied into the score for hits on that field. </p> </li> - + <li><p>Term Vectors. For each field in each document, the term vector + (sometimes called document vector) is stored. A term vector consists + of the term text, term frequency and term position. + </p> + </li> <li><p>Deleted documents. An optional file indicating which documents are deleted. </p> @@ -804,9 +808,10 @@ </p> <p> - Currently only the low-order bit is used of FieldBits is used. It is - one for - indexed fields, and zero for non-indexed fields. + The low-order bit is one for + indexed fields, and zero for non-indexed fields. The second lowest-order + bit is one for fields that have term vectors stored, and zero for fields + without term vectors. </p> <p> @@ -1112,6 +1117,57 @@ </li> </ol> + </subsection> + <subsection name="Term Vectors"> + Term Vector support is an optional on a field by field basis. It consists of 4 + files. + <ol> + <li> + <p>The Document Index or .tvx file.</p> + <p>This contains, for each document, a pointer to the document data in the Document + (.tvd) file. + </p> + <p>DocumentIndex (.tvx) --> <DocumentPosition><sup>NumDocs</sup></p> + <p>DocumentPosition --> UInt64</p> + <p>This is used to find the position of the Document in the .tvd file.</p> + </li> + <li> + <p>The Document or .tvd file.</p> + <p>This contains, for each document, the number of fields, a list of the fields with + term vector info and finally a list of pointers to the field information in the .tvf + (Term Vector Fields) file.</p> + <p> + Document (.tvd) --> <NumFields, FieldNums, FieldPositions,><sup>NumDocs</sup> + </p> + <p>NumFields --> VInt</p> + <p>FieldNums --> <FieldNumDelta><sup>NumFields</sup></p> + <p>FieldNumDelta --> VInt</p> + <p>FieldPositions --> <FieldPosition><sup>NumFields</sup></p> + <p>FieldPosition --> VLong</p> + <p>The .tvd file is used to map out the fields that have term vectors stored and + where the field information is in the .tvf file.</p> + </li> + <li> + <p>The Field or .tvf file.</p> + <p>This file contains, for each field that has a term vector stored, a list of + the terms and their frequencies.</p> + <p>Field (.tvf) --> <NumTerms, NumDistinct, TermFreqs, TermPositionPointerDelta><sup>NumFields</sup></p> + <p>NumTerms --> VInt</p> + <p>NumDistinct --> VInt -- Future Use</p> + <p>TermFreqs --> <TermText, TermFreq><sup>NumTerms</sup></p> + <p>TermText --> String</p> + <p>TermFreq --> VInt</p> + <p>TermPositionPointerDelta --> VLong</p> + <p></p> + </li> + <li> + <p>The Positions or .tvp file.</p> + <p>This contains, for each term in the Field and Document, the positional information for + each term in the document. </p> + <p>Positions (.tvp) --> <PositionDelta><sup>NumPositions</sup></p> + <p>PositionDelta --> VInt</p> + </li> + </ol> </subsection> <subsection name="Deleted Documents"> --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]