Re: [agi] Lossy & lossless compression

Mark Waser Sat, 26 Aug 2006 09:02:47 -0700

Let me state one more time why a lossless model has more knowledge. If xand x' have the same meaning to a lossy compressor (they compress toidentical codes), then the lossy model only knows p(x)+p(x'). A losslessmodel also knows p(x) and p(x'). You can argue that if x and x' are notdistinguishable then this extra knowledge is not important. But all textstrings are distinguishable to humans.

There is a difference between information and knowledge. Your argument is100% correct for information. It is not correct for knowledge. Informationonly counts as knowledge if it is *usable*. PKZip has exactly ONE piece ofknowledge --> the exact string that was fed to it. It can't do anythingelse with what it has other than reproduce that string.

Also in the opinion of speech recognition researchers studying languagemodels since the early 1990's.

Duh. If your purpose is to recognize speech then you don't want to lose anyof it. Your stated purpose was different -- thus, it makes sense to havedifferent judging criterai -- like, maybe, ones that are dictated by yourgoals.

Deciding if a lossy decompression is "close enough" is an AI problem, orit requires subjective judging by humans.

Absolutely not. We've covered this before. You can judge how muchknowledge a file contains by requiring that the decompression program outputit in a standard canonical form. The "smartest" program will probablyoutput far more knowledge than a team of puny humans could develop in alarge number of man-years (as well as give you some ideas for usefulresearch projects).


- - - - -

Seriously, dude -- I DO understand your defense of the contest butinsisting on lossless compression has *nothing* to do with KNOWLEDGE(though, maybe everything to do with judging).

----- Original Message -----From: "Matt Mahoney" <[EMAIL PROTECTED]>

To: <agi@v2.listbox.com>
Sent: Friday, August 25, 2006 7:54 PM
Subject: Re: [agi] Lossy *&* lossless compression

----- Original Message ----
From: Mark Waser <[EMAIL PROTECTED]>
To: agi@v2.listbox.com
Sent: Friday, August 25, 2006 5:58:02 PM
Subject: Re: [agi] Lossy *&* lossless compression
However, a machine with a lossless model will still outperform one witha
lossy model because the lossless model has more knowledge.
PKZip has a lossless model.  Are you claiming that it has more knowledge?
More data/information *might* be arguable but certainly not knowledge --and
PKZip certainly can't use any "knowledge" that you claim that it "has".
DEL has a lossy model, and nothing compresses smaller. Is it smarter thanPKZip?
Let me state one more time why a lossless model has more knowledge. If xand x' have the same meaning to a lossy compressor (they compress toidentical codes), then the lossy model only knows p(x)+p(x'). A losslessmodel also knows p(x) and p(x'). You can argue that if x and x' are notdistinguishable then this extra knowledge is not important. But all textstrings are distinguishable to humans.
But let me give an example of what we have already learned from losslesscompression tests.
1. PKZip, bzip2, ppmd, etc. model text at the character (ngram) level.
2. WinRK and paq8h model text at the lexical level using staticdictionaries. They compress better than (1).3. xml-wrt|ppmonstr and paq8hp1 model text at the lexical level usingdictionaries learned from the input. They compress better than (2).
I think you can see the pattern.
There has been research in semantic models using distant bigrams and LSA.These compress cleaned text (restricted vocabulary, no punctuation) betterthan models without these capabilities, as measured by word perplexity.Currently there are no general purpose compressors that model syntax orsemantics, probably because such models are only useful on large textcorpora, not the kind of files people normally compress. I think thatwill change if there is a financial incentive.
This does not change the fact that lossless compression is the right way
to evaluate a language model.
. . . . in *your* opinion.  I might argue that it is the *easiest* way to
evaluate a language model but certainly NOT the best -- and I would then
argue, therefore, not the "right" way either.
Also in the opinion of speech recognition researchers studying languagemodels since the early 1990's.
A lossy model cannot be evaluated objectively
Bullsh*t.  I've given you several examples of how.  You've discarded them
because you felt that they were "too difficult" and/or you didn'tunderstand
them.
Deciding if a lossy decompression is "close enough" is an AI problem, orit requires subjective judging by humans. Look at benchmarks for video oraudio codecs. Which sounds better, AAC or Ogg?
-- Matt Mahoney, [EMAIL PROTECTED]





-------
To unsubscribe, change your address, or temporarily deactivate yoursubscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]



-------

To unsubscribe, change your address, or temporarily deactivate your subscription,please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Lossy *&* lossless compression

Reply via email to

Re: [agi] Lossy & lossless compression