Re: Goertzel/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Mark Waser
Hi Ben, I agree with everything that you're saying; however, looking at the specific task: Create a compressed version (self-extracting archive) of the 100MB file enwik8 of less than 18MB. More precisely: a.. Create a Linux or Windows executable archive8.exe of size S L := 18'324'887 =

Re: Goertzel/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Ben Goertzel
Yes, but the compression software could have learned stuff before trying the Hutter Challenge, via compressing a bunch of other files ... and storing the knowledge it learned via this experience in its long-term memory... -- Ben On 8/15/06, Mark Waser [EMAIL PROTECTED] wrote: Hi Ben, I

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Mark Waser
I don't see any point in this debate over lossless vs. lossy compression Lets see if I can simplify it. The stated goal is compressing human knowledge. The exact, same knowledge can always be expressed in a *VERY*large number of different bit strings Not being able to reproduce

Re: **SPAM** Re: Goertzel/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Mark Waser
I think that our difference is that I am interpreting without input from other sources as not allowing that bunch of other files UNLESS that long-term memory is counted as part of the executable size. - Original Message - From: Ben Goertzel [EMAIL PROTECTED] To: agi@v2.listbox.com

Re: Goertzel/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread J. Storrs Hall, PhD.
On Tuesday 15 August 2006 09:03, Ben Goertzel wrote: Yes, but the compression software could have learned stuff before trying the Hutter Challenge, via compressing a bunch of other files ... and storing the knowledge it learned via this experience in its long-term memory... This could have a

Re: Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Matt Mahoney
I've read Chaniak's book, Statistical Language Learning. A lot of researchers in language modeling are using perplexity (compression ratio) to compare models. But there are some problems with the way this is done. 1. Many evaluations are done on corpora from the LDC which are not free, like

Re: [agi] confirmation paradox

2006-08-15 Thread Philip Goetz
A further example is: S1 = The fall of the Roman empire is due to Christianity. S2 = The fall of the Roman empire is due to lead poisoning. I'm not sure whether S1 or S2 is more true. But the question is how can you define the meaning of the NTV associated with S1 or S2? If we can't, why

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Matt Mahoney
I realize it is tempting to use lossy text compression as a test for AI because that is what the human brain does when we read text and recall it in paraphrased fashion. We remember the ideas and discard details about the _expression_ of those ideas. A lossy text compressor that did the same thing

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Mark Waser
1. The test is subjective. I disagree. If you have an automated test with clear criteria like the following, it will be completely objective: a)the compressing program must be able to output all inconsistencies in the corpus (in their original string form)AND b)the decompressing program

Re: [agi] confirmation paradox

2006-08-15 Thread Philip Goetz
On 8/15/06, Ben Goertzel [EMAIL PROTECTED] wrote: Phil, I see no conceptual problems with using probability theory to define context-dependent or viewpoint-dependent probabilities... Regarding YKY's example, causation is a subtle concept going beyond probability (but strongly probabilistically

Re: Goertzel/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Philip Goetz
On 8/15/06, Mark Waser [EMAIL PROTECTED] wrote: Ben Conceptually, a better (though still deeply flawed) contest would be: Compress this file of advanced knowledge, assuming as background knowledge this other file of elementary knowledge, in terms of which the advanced knowledge is defined.

Re: Goertzel/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Philip Goetz
On 8/15/06, Matt Mahoney [EMAIL PROTECTED] wrote: Ben wrote: Conceptually, a better (though still deeply flawed) contest would be: Compress this file of advanced knowledge, assuming as background knowledge this other file of elementary knowledge, in terms of which the advanced knowledge is

Re: Goetz/Goertzel/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Mark Waser
How about using OpenCyc? Actually, instructing the competitors to compress both the OpenCyc corpus AND then the Wikipedia sample in sequence and measuring the size of both *would* be an interesting and probably good contest. - Original Message - From: Philip Goetz [EMAIL PROTECTED]

Re: Goetz/Goertzel/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Philip Goetz
On 8/15/06, Mark Waser [EMAIL PROTECTED] wrote: Actually, instructing the competitors to compress both the OpenCyc corpus AND then the Wikipedia sample in sequence and measuring the size of both *would* be an interesting and probably good contest. I think it would be more interesting for it to

Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Philip Goetz
I proposed knowledge-based text compression as a dissertation topic, back around 1991, but my advisor turned it down. I never got back to the topic because there wasn't any money in it - text is already so small, relative to audio and video, that it was clear that the money was in audio and

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Matt Mahoney
You could use Keogh's compression dissimilarity measure to test for inconsistency.http://www.cs.ucr.edu/~eamonn/SIGKDD_2004_long.pdf CDM(x,y) = C(xy)/(C(x)+C(y)).where x and y are strings, and C(x) means the compressed size of x (lossless). The measure ranges from about 0.5 if x = y to about 1.0

Re: [agi] confirmation paradox

2006-08-15 Thread Ben Goertzel
Hi, Phil wrote: There isn't a problem in doing it, but there's serious doubts whether an approach in which symbols have constant meanings (the same symbol has the same semantics in different propositions) can lead to AI. Sure, but neither Novamente nor NARS (for example) has the problematic

Re: Goetz/Goertzel/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Mark Waser
I think it would be more interesting for it to use the OpenCyc corpus as its knowledge for compressing the Wikipedia sample. The point is to demonstrate intelligent use of information, not to get a wider variety of data. :-) My assumption is that the compression program is building/adding to

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Philip Goetz
On 8/15/06, Matt Mahoney [EMAIL PROTECTED] wrote: I realize it is tempting to use lossy text compression as a test for AI because that is what the human brain does when we read text and recall it in paraphrased fashion. We remember the ideas and discard details about the expression of those

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Mark Waser
You could use Keogh's compression dissimilarity measure to test for inconsistency. I don't think so. Take the following strings: "I only used red and yellow paint in the painting", "I painted the rose in my favorite color", "My favorite color is pink", "Orange is created by mixing red and

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Matt Mahoney
Mark wrote:Huh? By definition, the compressor with the best language model is the one with the highest compression ratio. I'm glad we finally agree :-) You could use Keogh's compression dissimilarity measure to test for inconsistency. I don't think so. Take the following strings: "I only used

Re: Goetz/Goertzel/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Philip Goetz
On 8/15/06, Mark Waser [EMAIL PROTECTED] wrote: I think it would be more interesting for it to use the OpenCyc corpus as its knowledge for compressing the Wikipedia sample. The point is to demonstrate intelligent use of information, not to get a wider variety of data. :-) My assumption is