Re: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]

Andrew Coppin Mon, 09 Jul 2007 13:14:56 -0700

Bulat Ziganshin wrote:

Hello Andrew,


Sunday, July 8, 2007, 7:07:59 PM, you wrote:

i don't think that ppm is so complex - it's just probability of
symbol in some context. it's just too slow in naive implementation

Oh, sure, the *idea* is simple enough. Trying to actually *implement* itcorrectly is something else... ;-)


(Same statemenst go for arithmetic coding, really.)

(The downside of course is that now we need a Huffman table in theoutput - and any rare symbols might end up with rather long codewords.But remember: Huffman *guarantees* 0% compression or higher on thepayload itself. A Huffman-compressed payload can *never* be bigger, onlysmaller or same size. So long as you can encode the Huffman tableefficiently, you should be fine...)
the devil in details. just imagine size of huffman table with 64k
entries :)  huffman encoding is inappropriate for lzw output simply
because most words will have only a few occurrences and economy on
their optimal encoding doesn't justify price of their entries in the table

...which is why you need to "encode the Huffman table efficiently", toquote myself. ;-)

Using canonical Huffman, you only actually need to know how many bitswere assigned to each symbol. This information is probably veryameanable to RLE. (Which, incidentally, is why I started this whole"parser on top of a phaser" crazyness in the first place.) So, yeah,there may be 64k symbols - but if only 1k of them are ever *used*... ;-)

.ru = Russia?


of course


My Russian is very rusty. ;-)

Oh hey, I think GHC is already pretty smart. But no optimiser can ever
hope to cover *every* possible case. And transforming [Bool] -> [Bool]
into UArray Word8 ->> UArray Word8 just seems a liiiiittle bit
optimistic, methinks. ;-)


15 years ago i've written very smart asm program (btw, it was ARJ
unpacker) with handmade function inlining, loop unrolling, register
allocation, cpu recognition and so on. now, most of these tricks are
standard for C compilers. times changes and now it's hard to imagine which
optimizations will be available 10 years later


Yes, but there are limits to what an optimiser can hope to accomplish.

For example, you wouldn't implement a bubble sort and seriously expectthe compiler to be able to "optimise" that into a merge sort, would you? ;-)

ghc's native and via-C modes are blind vs lame. in native mode, its
codegenerator is comparable with 20 years-old C codegenerators. see
above how much modern C compilers changed in these years. in via-C
mode it generates unnatural C code which is hard to optimize for any C
compiler.


I'll take your word for it. ;-)

(I have made cursory attempts to comprehend the inner workings of GHC -but this is apparently drastically beyond my powers of comprehension.)

the jhc is very different story

Yes - last I heard, it's an experimental research project rather than aproduction-ready compiler...


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]

Reply via email to