Hello everybody,
I am reading the file format paper and I check it against a created index.
The documentation says:
TermInfoIndex (.tii)--> TIVersion, IndexTermCount, IndexInterval,
SkipInterval, MaxSkipLevels, TermIndices
If I look into the .tii-file I see the following:
TIVersion = FF FF FF FC (4 Bytes)
IndexTermCount = 00 00 00 00 00 00 00 0C = 10 (8 Bytes)
IndexInterval = 00 00 00 80 = 128 (4 Bytes)
SkipInterval = 00 00 00 10 = 16 (4 Bytes)
MaxSkipLevels = 00 00 00 0A = 10 (4 Bytes)
TermIndices = ????? (? Bytes)
I looked in two indexes and for both the following byte sequences are equal
(marked bold):
*00 00 FF FF FF FF 0F 00 00 00 18 00* (0B 61 or 0D30 .....)
Maybe I don't understand the Map with *<TermInfo, IndexDelta>^IndexTermCount
*. How should I calculate the correct byte length?
I assume the IndexDelta with VLong has 8 bytes if the leading bit is 0
(Similar vo VInt or is VLong somewhere described?). TermInfo is explained in
the .tis file section.
TermIndices = <TermInfo, IndexDelta>
= <(Term,DocFreq,FreqDelta,ProxDelta,SkipDelta), IndexDelta>
= <([PrefixLength,Suffix,FieldNum],DocFreq,FreqDelta,ProxDelta,SkipDelta),
IndexDelta>
= <([ 00 , 00 , FF ], FF , FF
, FF , 0F ), 00 00 00 18 00 0B 61 6E>
IndexDelta is to large for my small index! Also DocFreq is to large because
I only have 16 documents in total. :(
Can somebody tell me how to read the bytes correctly from the file? I would
like to find the correct position in the .tis file from .tii data.
Best regards
Alex