[Haskell-cafe] Re: ANN: bitspeak 0.0.1

Maurício CA Wed, 23 Jun 2010 20:00:22 -0700

Sure, Huffman was actually my first tought. But I couldn't think
of a pratical display for the result of Huffman encoding that
could be easily followed by a human looking at the screen. Since
it's an optimal code, letters would not be grouped in alphabetical
order.


There is a compromise.
There is such a thing as an ORDERED Huffman code.
Consider a set of strings.
If they call begin with the same first letter,
assume that letter and consider the suffixes instead.
Otherwise, choose a letter L such that
as close as possible to half of the strings begin
with a letter preceding L in the alphabet
as close as possible to half of the strings begin
with the letter L or a later letter.


I believe that's what I've done. I use this file to give weight
to letters:

  http://bitbucket.org/mauricio/bitspeak/src/tip/src/Corpora.hs

After each letter is selected I keep only sufixes after that
letter, and then I use 'splitData' to find the point where
I'll separate both halves:

  http://bitbucket.org/mauricio/bitspeak/src/tip/src/Main.hs

Best,

Maurício

_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: ANN: bitspeak 0.0.1

Reply via email to