--- On Sun, 12/28/08, Philip Hunt <[email protected]> wrote:

> > Please remember that I am not proposing compression as
> > a solution to the AGI problem. I am proposing it as a
> > measure of progress in an important component (prediction).
> 
> Then why not cut out the middleman and measure prediction
> directly?

Because a compressor proves the correctness of the measurement software at no 
additional cost in either space or time complexity or software complexity. The 
hard part of compression is modeling. Arithmetic coding is essentially a solved 
problem. A decompressor uses exactly the same model as a compressor. In high 
end compressors like PAQ, the arithmetic coder takes up about 1% of the 
software, 1% of the CPU time, and less than 1% of memory.

In speech recognition research it is common to use word perplexity as a measure 
of the quality of a language model. Experimentally, it correlates well with 
word error rate. Perplexity is defined as 2^H where H is the average number of 
bits needed to encode a word. Unfortunately this is sometimes done in 
nonstandard ways, such as with restricted vocabularies and different methods of 
handling words outside the vocabulary, parsing, stemming, capitalization, 
punctuation, spacing, and numbers. Without accounting for this additional data, 
it makes published results difficult to compare. Compression removes the 
possibility of such ambiguities.

-- Matt Mahoney, [email protected]



-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Reply via email to