---------------------------------------<snip>-----------------------------------
In certain isolated cases, any record may GROW instead of shrink when
going
through the ARCHIVER's compaction process (using the Huffman algorithm).
That is interesting. I thought that one of the attributes of the
Huffman algorithm was that expansion due to the substitution was
impossible (so long as the codes were built by scanning the FULL file
first to get the occurrence statistics). I assume your problem is thus
due to using a sample of the file to represent the full file and that
the codepoint distribution of the full file is not the same as the
sample section (ie: There are enough of the codepoints that are being
assigned the longer strings in the un-sampled section to make the
sample produce inaccurate data).
----------------------------------<unsnip>--------------------------------------
You have just described the Adaptive Huffman algorithm. Archiver
currently uses a hard-coded Huffman tree that was developed by a
character-count of everything on my SYSRES and DLIB volumes in my shop
at the time. At the time, I had never heard of, or seen, an adaptive
Huffman algorithm.
Rick
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html