---------------------------------------<snip>-----------------------------------

In certain isolated cases, any record may GROW instead of shrink when going
through the ARCHIVER's compaction process (using the Huffman algorithm).


That is interesting. I thought that one of the attributes of the Huffman algorithm was that expansion due to the substitution was impossible (so long as the codes were built by scanning the FULL file first to get the occurrence statistics). I assume your problem is thus due to using a sample of the file to represent the full file and that the codepoint distribution of the full file is not the same as the sample section (ie: There are enough of the codepoints that are being assigned the longer strings in the un-sampled section to make the sample produce inaccurate data).

----------------------------------<unsnip>--------------------------------------
You have just described the Adaptive Huffman algorithm. Archiver currently uses a hard-coded Huffman tree that was developed by a character-count of everything on my SYSRES and DLIB volumes in my shop at the time. At the time, I had never heard of, or seen, an adaptive Huffman algorithm.

Rick

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to