Author: Mario
Email: [EMAIL PROTECTED]
Message:
there are alot of you out there who experiencing a serious loss of data due to data 
inconsistencies in the cache log files.  In order to figure out what is wrong, I have 
mapped out the cache log file format and will present it her in ASCII art :)

things to know before going into majore tech detail:
the first 3 bytes of the crc value are used for the tree path,   
then the crc value is X0Red by 0Xfffff000 in order to yield 4096 unique words per file.

COOL, now what about the file itself?
The strucutres look something like this:
/-------Header------------\
|-Header Vers1ion- 4 bytes|
|-Number of tables-4 bytes|
\-------------------------/
*version is calculated by the number of extra flags you used when compiling mnogo 
(such as --phrase ,--fast_site, fast_tag, etc...)

followed by a run of tables indicating unique words and their weight(is that right?).
/-------Table(n)----------\
|-Word_id is crc value  4b|
|-Weight                4b|
|-filepos               4b|
|-bytelength(word run)  4b|
\-------------------------/
and then even more of these
/------word(n,p)----------\
|*depending on what word  |
|constructs you used, this| 
|will vary.               |
|-(permanent)urlid  4bytes|
|-(variable)word_pos 4b   |
|-(variable)site_id  4b   |
|-(variable)category 4b   |
|-(variable)tag      4b   |
\-------------------------/

Compiling different versions of indexer,using various flags (--phrase, 
--fasttag,etc....) may cause indexer to split the wrong data.

Thats all for now.

Mario

Reply: <http://www.mnogosearch.org/board/message.php?id=2485>

___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Reply via email to