Sure, Huffman was actually my first tought. But I couldn't think
of a pratical display for the result of Huffman encoding that
could be easily followed by a human looking at the screen. Since
it's an optimal code, letters would not be grouped in alphabetical
order.

There is a compromise.
There is such a thing as an ORDERED Huffman code.
Consider a set of strings.
If they call begin with the same first letter,
assume that letter and consider the suffixes instead.
Otherwise, choose a letter L such that
as close as possible to half of the strings begin
with a letter preceding L in the alphabet
as close as possible to half of the strings begin
with the letter L or a later letter.

I believe that's what I've done. I use this file to give weight
to letters:

  http://bitbucket.org/mauricio/bitspeak/src/tip/src/Corpora.hs

After each letter is selected I keep only sufixes after that
letter, and then I use 'splitData' to find the point where
I'll separate both halves:

  http://bitbucket.org/mauricio/bitspeak/src/tip/src/Main.hs

Best,

Maurício

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to