Author: Mario Email: [EMAIL PROTECTED] Message: there are alot of you out there who experiencing a serious loss of data due to data inconsistencies in the cache log files. In order to figure out what is wrong, I have mapped out the cache log file format and will present it her in ASCII art :) things to know before going into majore tech detail: the first 3 bytes of the crc value are used for the tree path, then the crc value is X0Red by 0Xfffff000 in order to yield 4096 unique words per file. COOL, now what about the file itself? The strucutres look something like this: /-------Header------------\ |-Header Vers1ion- 4 bytes| |-Number of tables-4 bytes| \-------------------------/ *version is calculated by the number of extra flags you used when compiling mnogo (such as --phrase ,--fast_site, fast_tag, etc...) followed by a run of tables indicating unique words and their weight(is that right?). /-------Table(n)----------\ |-Word_id is crc value 4b| |-Weight 4b| |-filepos 4b| |-bytelength(word run) 4b| \-------------------------/ and then even more of these /------word(n,p)----------\ |*depending on what word | |constructs you used, this| |will vary. | |-(permanent)urlid 4bytes| |-(variable)word_pos 4b | |-(variable)site_id 4b | |-(variable)category 4b | |-(variable)tag 4b | \-------------------------/ Compiling different versions of indexer,using various flags (--phrase, --fasttag,etc....) may cause indexer to split the wrong data. Thats all for now. Mario Reply: <http://www.mnogosearch.org/board/message.php?id=2485> ___________________________________________ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
