Hmm, no responses? Well then, I created my own Huffman format in honor of its inventor, David Albert Huffman, who I officially bow before in this very moment.
*Final format "HUF1"* (few changes from the draft above) *Part 1 (magic code, optional): "HUF1" in ASCII* *Part 2: Huffman tree starting from root. From here, we write bits, filling each byte from the lowest bit.* *For each node of the tree, we transmit:* *-a bit of 0 if it's a leaf, followed by the 8 bits of the character* *-a bit of 1 if it's a branch node, followed by the subtree for the zero bit and then the subtree for the one bit.* (Contrary to the draft, We don't fill up the current byte, rather we use any remaining bits in the current byte for the next part.) *Part 3a: Length of the uncompressed text in bytes*, encoded as a variable-size int with chunk size 3 *Part 3b: The actual text, encoded using the tree above* Definition "Variable-size int with chunk size 3" <https://code.botcompany.de/1035687>: The unsigned integer to be transmitted is split in 3 bit chunks. Step 1: You transmit the lowest 3 bits. Step 2a: If the integer has been transmitted completely, transmit a 0 bit and end. Step 2b: If there are untransmitted bits, shift the remaining number 3 bits to the right and go back at step 1. Implementation Java code for reading/writing HUF1 files. <https://code.botcompany.de/1035674> Unit test. <https://code.botcompany.de/1035680> Verdict: The format is pretty efficient and minimal, even for short texts, especially when you drop the 4 magic bytes. And of course, Huffman compression and decompression are O(n) in the length of the text (assuming fixed symbol set size). [Application: My plan is to operate mostly with text file formats for their massive convenience and reliability increase (including integers spelled out in decimal/hex, english code words etc), and simply put those text files through the Huffman compressor in order to get a half decent file size.] ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tc2bc76d436054024-M582e346de8d6a69d1580d854 Delivery options: https://agi.topicbox.com/groups/agi/subscription