Hmm, no responses? Well then, I created my own Huffman format in honor of its 
inventor, David Albert Huffman, who I officially bow before in this very moment.

*Final format "HUF1"*
(few changes from the draft above)

*Part 1 (magic code, optional): "HUF1" in ASCII*

*Part 2: Huffman tree starting from root. From here, we write bits, filling 
each byte from the lowest bit.*
*For each node of the tree, we transmit:*
*-a bit of 0 if it's a leaf, followed by the 8 bits of the character*
*-a bit of 1 if it's a branch node, followed by the subtree for the zero bit 
and then the subtree for the one bit.*

(Contrary to the draft, We don't fill up the current byte, rather we use any 
remaining bits in the current byte for the next part.)

*Part 3a: Length of the uncompressed text in bytes*, encoded as a variable-size 
int with chunk size 3
*Part 3b: The actual text, encoded using the tree above*

Definition "Variable-size int with chunk size 3" 
<https://code.botcompany.de/1035687>: The unsigned integer to be transmitted is 
split in 3 bit chunks.
Step 1: You transmit the lowest 3 bits.
Step 2a: If the integer has been transmitted completely, transmit a 0 bit and 
end.
Step 2b: If there are untransmitted bits, shift the remaining number 3 bits to 
the right and go back at step 1.

Implementation

Java code for reading/writing HUF1 files. <https://code.botcompany.de/1035674> 
Unit test. <https://code.botcompany.de/1035680>

Verdict: The format is pretty efficient and minimal, even for short texts, 
especially when you drop the 4 magic bytes.  And of course, Huffman compression 
and decompression are O(n) in the length of the text (assuming fixed symbol set 
size).

[Application: My plan is to operate mostly with text file formats for their 
massive convenience and reliability increase (including integers spelled out in 
decimal/hex, english code words etc), and simply put those text files through 
the Huffman compressor in order to get a half decent file size.]
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tc2bc76d436054024-M582e346de8d6a69d1580d854
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to