On Wed, 4 Sep 2002, Walantis Giosis wrote: > The ID bytes for length informations (excerpt length, docume size, URL > length) varies. Say we have a document size of less than 100h bytes. > Then the ID byte has the value 44h for that information. The size > needs only one byte. If the size exceeds 100h bytes (it needs two or > more bytes) then the ID byte has the value 84h. What's the logic > behind this ? Only to determine the byte count for the size ? At the > moment I've handled it using a switch/case statement.
Hans-Peter Nilsson rewrote the Serialize/Deserialize routines very carefully, so I can't speak authoritatively. I think he was trying to save as much space as possible. AFAICT, there's a marker indicating that the next variable coming up is sizeof() whatever. Take a look at htcommon/DocumentRef.cc::Serialize() to see the code. > And why is the document size information stored twice in the database ? They should be different. See htcommon/DocumentRef.[cc,h] which deals with the document DB records. In particular, there's the text size of the database and optionally, it can figure out the size of the document including all images. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ ------------------------------------------------------- This sf.net email is sponsored by: OSDN - Tired of that same old cell phone? Get a new here for FREE! https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev