AIXItl; Wolfram's hypothesis (was Re: [agi] How valuable is Solmononoff Induction for real world AGI?)
From: Lukasz Stafiniak [EMAIL PROTECTED] The programs are generally required to exactly match in AIXI (but not in AIXItl I think). I'm pretty sure AIXItl wants an exact match too. There isn't anything there that lets the theoretical AI guess probability distributions and then get scored based on how probable the actual world is according to that distribution -- each hypothesis is either right or wrong, and wrong hypotheses are discarded. The reference I use for AIXItl is: http://www.hutter1.net/ai/aixigentle.htm On Nov 9, 2007 5:26 AM, Edward W. Porter [EMAIL PROTECTED] wrote: are these short codes sort of like Wolfram little codelettes, that can hopefully represent complex patterns out of very little code, or do they pretty much represent subsets of visual patterns as small bit maps. From: Lukasz Stafiniak [EMAIL PROTECTED] It depends on reality, whether the reality supports Wolfram's hypothesis. I'm guessing you mean the Priniciple of Computational Equivalence, as defined at: http://mathworld.wolfram.com/PrincipleofComputationalEquivalence.html He's saying that 'systems found in the natural world can perform computations up to a maximal (universal) level of computational power'. All the AIXI family needs to be near-optimal is for the probability distribution of possible outcomes to be computable. I couldn't quickly tell whether Wolfram is saying that the actual outcomes are computable, or just the probabilities of the outcomes. -- Tim Freeman http://www.fungible.com [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=63844010-625f39
Re: AIXItl; Wolfram's hypothesis (was Re: [agi] How valuable is Solmononoff Induction for real world AGI?)
On Nov 10, 2007 4:47 PM, Tim Freeman [EMAIL PROTECTED] wrote: From: Lukasz Stafiniak [EMAIL PROTECTED] The programs are generally required to exactly match in AIXI (but not in AIXItl I think). I'm pretty sure AIXItl wants an exact match too. There isn't anything there that lets the theoretical AI guess probability distributions and then get scored based on how probable the actual world is according to that distribution -- each hypothesis is either right or wrong, and wrong hypotheses are discarded. I agree that I misinterpreted the meaning of exact match. AIXItl uses strategies whose outputs do not need to agree with history. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=63846012-5d1170
Re: [agi] How valuable is Solmononoff Induction for real world AGI?
On Nov 9, 2007 5:26 AM, Edward W. Porter [EMAIL PROTECTED] wrote: ED ## what is the value or advantage of conditional complexities relative to conditional probabilities? Kolmogorov complexity is universal. For probabilities, you need to specify the probability space and initial distribution over this space. ED ## What's a TM? (Turing Machine, or a code for a universal Turing Machine = a program...) Also are you saying that the system would develop programs for matching patterns, and then patterns for modifying those patterns, etc, So that similar patterns would be matched by programs that called a routine for a common pattern, but then other patterns to modify them to fit different perceptions? Yes, these programs will be compact description od data when enough data gets collected, so their (posterior) probability will grow with time. But the most probable programs will be very cryptic, without redundancy to make the structure evident. So are the programs just used for computing Kolmogorov complexity or are they also used for generating and matching patterns. It is difficult to say: in AIXI, the direct operation is governed by the expectimax algorithm, but the algorithm works in future (is derived from the Solomonoff predictor). Hutter mentions alternative model AIXI_alt, which models actions the same way as the environment... Does it require that the programs exactly match a current pattern being received, or does it know when a match is good enough that it can be relied upon as having some significance? It is automatic: when you have a program with a good enough match, then you can parameterize it over the difference and apply twice, thus saving the code. Remember that the programs need to represent the whole history. Can the programs learn that similar but different patterns are different views of the same thing? Can they learn a generalizational and compositional hierarchy of patterns? With an egzegetic enough interpretation... I will comment on further questions in a few hours. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=63453551-e3704c
RE: [agi] How valuable is Solmononoff Induction for real world AGI?
Thank you for your reply. I want to take some time and compare this with the reply I got from Shane Legg and get back to you when I have more time to think about it. Edward W. Porter Porter Associates 24 String Bridge S12 Exeter, NH 03833 (617) 494-1722 Fax (617) 494-1822 [EMAIL PROTECTED] -Original Message- From: Lukasz Stafiniak [mailto:[EMAIL PROTECTED] Sent: Friday, November 09, 2007 7:13 AM To: agi@v2.listbox.com Subject: Re: [agi] How valuable is Solmononoff Induction for real world AGI? On Nov 9, 2007 5:26 AM, Edward W. Porter [EMAIL PROTECTED] wrote: ED ## what is the value or advantage of conditional complexities relative to conditional probabilities? Kolmogorov complexity is universal. For probabilities, you need to specify the probability space and initial distribution over this space. ED ## What's a TM? (Turing Machine, or a code for a universal Turing Machine = a program...) Also are you saying that the system would develop programs for matching patterns, and then patterns for modifying those patterns, etc, So that similar patterns would be matched by programs that called a routine for a common pattern, but then other patterns to modify them to fit different perceptions? Yes, these programs will be compact description od data when enough data gets collected, so their (posterior) probability will grow with time. But the most probable programs will be very cryptic, without redundancy to make the structure evident. So are the programs just used for computing Kolmogorov complexity or are they also used for generating and matching patterns. It is difficult to say: in AIXI, the direct operation is governed by the expectimax algorithm, but the algorithm works in future (is derived from the Solomonoff predictor). Hutter mentions alternative model AIXI_alt, which models actions the same way as the environment... Does it require that the programs exactly match a current pattern being received, or does it know when a match is good enough that it can be relied upon as having some significance? It is automatic: when you have a program with a good enough match, then you can parameterize it over the difference and apply twice, thus saving the code. Remember that the programs need to represent the whole history. Can the programs learn that similar but different patterns are different views of the same thing? Can they learn a generalizational and compositional hierarchy of patterns? With an egzegetic enough interpretation... I will comment on further questions in a few hours. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?; - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=63505719-0476eb
Re: [agi] How valuable is Solmononoff Induction for real world AGI?
On Nov 9, 2007 5:26 AM, Edward W. Porter [EMAIL PROTECTED] wrote: So are the programs just used for computing Kolmogorov complexity or are they also used for generating and matching patterns. The programs do not compute K complexity, they (their length) _are_ (a variant of) Kolmogorov complexity. The programs compute (predict) the environment. Does it require that the programs exactly match a current pattern being received, or does it know when a match is good enough that it can be relied upon as having some significance? The programs are generally required to exactly match in AIXI (but not in AIXItl I think). But the significance is provided by the compression on representation of similar things, which favors the same sort of similarity in the future. Can they run on massively parallel processing. I think they can... In AIXI, you would build a summation tree for the posterior probability. The Hutters expectimax tree appears to alternate levels of selection and evaluation. Can the Expectimax tree run in reverse and in parallel, with information coming up from low sensory levels, and then being selected based on their relative probability, and then having the selected lower level patterns being fed as inputs into higher level patterns and then repeating that process. That would be a hierarchy that alternates matching and then selecting the best scoring match at alternate levels of the hierarchy as is shown in the Serre article I have cited so many times before on this list. To be optimal, the expectimax must be performed chronologically from the end of the horizon (dynamic programming principle: close to the end of the time horizon, you have smaller planning problems -- less opportunities; from smaller solutions to smaller problems you build bigger solutions backwards in time). But the probabilities are conditional on all current history including low sensory levels. (Generally, your comment above doesn't make much sense in the AIXI context.) ED## are these short codes sort of like Wolfram little codelettes, that can hopefully represent complex patterns out of very little code, or do they pretty much represent subsets of visual patterns as small bit maps. It depends on reality, whether the reality supports Wolfram's hypothesis. Best Regards. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=63539823-b308a9
RE: [agi] How valuable is Solmononoff Induction for real world AGI?
Jeff, (to make it easier to know who is responding to whom, if any of this is cut into postings by others I have inserted a before JEF ## to indicate his comments occurred first in time.) JEF ## Edward, can you explain what you might have meant by based on the likelihood that the probability...? ED ## I think my statement -- Dragon selected speech recognition word candidates based on the likelihood that the probability distribution of their model matched the acoustic evidence -- maps directly into your statement that -- likelihood is simply the probability of some data. The probability of some data given a models probability distribution can, I think, be properly considered a match between the distribution of the data and the distribution of the model. Maybe in Jef-speak that is not a proper use of the word match, but I think in normal parlance, even in computer science it is. Correct me if I am wrong. Remember at Dragon we were scoring multiple different word models probability distributions against the acoustic data, and those scores were considered to indicate a degree of match between the model and the data. Jef ## Given all the relevant parameters is key, and implies objectivity. Without all the relevant parameters of the likelihood function, you are left with probability, which is inherently subjective. When you said based on the likelihood that the probability, it seemed that you were somehow (?) confusing the subjective with the objective, which in my opinion, is a theme running through this entire thread. ED ## According to the above statement all likelihood functions that are computable are subjective, and thus according to your definition just probabilities. This is because it is impossible for a computable likelihood function to include all possibly relevant parameters. No computable system knows enough about the world to know what the relevant parameters are. There always could be an, as yet, un-modeled glitch in the Matrix. Thus, your implication that I had somehow confused the correct definition of likelihood, which would have it be objective, with one that was subjective (because it did not use all relevant parameters), would seem to be a crime committed by any person who has ever talked about the actual likelihood calculations (which would include a majority of the people in the field). Again my offense seems to be using words as most in the field do, rather than in strick adherence to Jef-speak. Jef ## How does this map onto your difficulty grasping the significance of Solomonoff induction? Solomonoff induction is an idealized description of learning by a subjective agent interacting with an objective (actually consistent might be more accurate here) reality. ED ## Finally, I am learning what our whole back and forth has been about. I wish our correspondence had included more sentences like this earlier on. But if I am guilty of using likelihoods in a way that sullies them by making them subjective, how does that make them any worse than Solomonoff induction? According to the above isnt it is guilty of the same lack of purity because it is describing learning by a subjective agent. Or are you are claiming Solomonoff induction is an objective description of a subjective thing? Words are often stretched so far (although I thought not in Jef-speak). But if Solomonoff induction is based on generalizations assuming knowledge about things it can never know, how is it that any less subjective than a likelihood function calculated without all relevant parameters? Does pretending we know everything about reality make our understanding of it any less subjective? Pretending can allow some useful thought experiments, but are they objective? Are mathematical proofs objective?. How do we know they are based on all the relevant parameters? Isn't math just a creation in our heads, and thus subjective? Yes, scientific evidence suggests it describes some real things in the real world, really well, but that is all based on sensation, and that, according to you is subjective. Ed Porter P.S. Since your hobby is collecting paradoxes, if you have a few that are either particularly insightful or amusing (and hopefully only a sentence or two long), please feel free to share. -Original Message- From: Jef Allbright [mailto:[EMAIL PROTECTED] Sent: Friday, November 09, 2007 2:46 PM To: agi@v2.listbox.com Subject: Re: [agi] How valuable is Solmononoff Induction for real world AGI? On 11/8/07, Edward W. Porter [EMAIL PROTECTED] wrote: ED Most importantly you say my alleged confusion between subjective and objective maps into my difficulty to grasp the significance of Solomonoff induction. If you could do so, please explain what you mean. Given our significantly disjoint backgrounds, the best I hoped for was to point out where you're not going to get a good answer because you're not asking a good question
Re: [agi] How valuable is Solmononoff Induction for real world AGI?
I recently found this paper to contain some thinking worthwhile to the considerations in this thread. http://lcsd05.cs.tamu.edu/papers/veldhuizen.pdf - Jef - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=62868749-8ed517
Re: [agi] How valuable is Solmononoff Induction for real world AGI?
Is there any research that can tell us what kind of structures are better for machine learning? Or perhaps w.r.t a certain type of data? Are there learning structures that will somehow learn things faster? There is plenty of knowledge about which learning algorithms are better for which problem classes. For example, there are problems known to be deceptive (not efficiently solved) for genetic programming, yet that are known to be efficiently solvable by MOSES, the probabilistic program learning method used in Novamente (from Moshe Looks' PhD thesis, see metacog.org) Note that, if the answer is negative, then the choice of learning structures is arbitrary and we should choose the most developed / heavily researched ones (such as first-order logic). The choice is not at all arbitrary; but the knowledge we have to guide the choice is currently very incomplete. So one has to make the right intuitive choice based on integrating the available information. This is part of why AGI is hard at the current level of development of computer science. -- Ben G - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=62863431-4f3cc5
Re: [agi] How valuable is Solmononoff Induction for real world AGI?
My impression is that most machine learning theories assume a search space of hypotheses as a given, so it is out of their scope to compare *between* learning structures (eg, between logic and neural networks). Algorithmic learning theory - I don't know much about it - may be useful because it does not assume a priori a learning structure (except that of a Turing machine), but then the algorithmic complexity is incomputable. Is there any research that can tell us what kind of structures are better for machine learning? Or perhaps w.r.t a certain type of data? Are there learning structures that will somehow learn things faster? Note that, if the answer is negative, then the choice of learning structures is arbitrary and we should choose the most developed / heavily researched ones (such as first-order logic). YKY - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=62846736-33363c
Re: [agi] How valuable is Solmononoff Induction for real world AGI?
Thanks for the input. There's one perplexing theorem, in the paper about the algorithmic complexity of programming, that the language doesn't matter that much, ie, the algorithmic complexity of a program in different languages only differ by a constant. I've heard something similar about the choice of Turing machines only affect the Kolmogorov complexity by a constant. (I'll check out the proof of this one later.) But it seems to suggest that the choice of the AGI's KR doesn't matter. It can be logic, neural network, or java? That's kind of a strange conclusion... YKY - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=62888479-02c7e4
Re: [agi] How valuable is Solmononoff Induction for real world AGI?
On 08/11/2007, YKY (Yan King Yin) [EMAIL PROTECTED] wrote: My impression is that most machine learning theories assume a search space of hypotheses as a given, so it is out of their scope to compare *between* learning structures (eg, between logic and neural networks). Algorithmic learning theory - I don't know much about it - may be useful because it does not assume a priori a learning structure (except that of a Turing machine), but then the algorithmic complexity is incomputable. Is there any research that can tell us what kind of structures are better for machine learning? Not if all problems are equi-probable. http://en.wikipedia.org/wiki/No_free_lunch_in_search_and_optimization However this is unlikely in the real world. It does however give an important lesson, put as much information as you have about the problem domain into the algorithm and representation as possible, if you want to be at all efficient. This form of learning is only a very small part of what humans do when we learn things. For example when we learn to play chess, we are told or read the rules of chess and the winning conditions. This allows us to create tentative learning strategies/algorithms that are much better than random at playing the game and also giving us good information about the game. Which is how we generally deal with combinatorial explosions. Consider a probabilistic learning system based on statements about the real world TM, without this ability to alter how it learns and what it tries, it would be looking at the probability of whether a bird tweeting is correlated with his opponent winning, and also trying to figure out whether emptying an ink well over the board is a valid move. I think Marcus Hutter has a bit about how slow AIXI would be at learning chess somewhere in writings, due to only getting a small amounts of information (1 bit ?) per game about the problem domain. My memory might be faulty and I don't have time to dig at the moment Or perhaps w.r.t a certain type of data? Are there learning structures that will somehow learn things faster? Thinking in terms of fixed learning structures is IMO a mistake. Interstingly AIXI doesn't have fixed learning structures per se, even though it might appear to. Because it stores the entire history of the agent and feeds it to each program under evaluation, each of these may be a learning program and be able to create learning strategies from that data. You would have to wait a long time for these types of programs to become the most probable if a good prior was not given to the system though. Will Pearson - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=62882969-3d3172
RE: [agi] How valuable is Solmononoff Induction for real world AGI?
VLADIMIR NESOV IN HIS 11/07/07 10:54 PM POST SAID VLADIMIR Hutter shows that prior can be selected rather arbitrarily without giving up too much ED Yes. I was wondering why the Solomonoff Induction paper made such a big stink about picking the prior (and then came up which a choice that struck me as being quite sub-optimal in most of the types of situations humans deal with). After you have a lot of data, you can derive the equivalent of the prior from frequency data. As the Solmononoff Induction paper showed, using Bayesian formulas the effect of the prior fades off fairly fast as data comes in. (However, I have read that for complex probability distributions the choice of the class of mathematical model you use to model the distribution is part of the prior choosing issue, and can be important but that did not seem to be addressed in the Solomonoff Induction paper. For example in some speech recognition each of the each speech frame model has a pre-selected number of dimensions, such as FFT bins (or related signal processing derivatives), and each dimension is not represented by a Gausian but rather by a basis function comprised of a set of a selected number of Gausians.) It seems to me that when you dont have much frequency data, we humans normally make a guess based on the probability of similar things, as suggested in the Kemp paper I cited.It seems to me that is by far the most commonsensical approach. In fact, due to the virtual omnipreseance of non-literal similarity in everything we see and hear, (e.g., the same face virtually never hits V1 exactly the same) most of our probabilistic thinking is dominated by similarity derived probabilities. BEN GOERTZEL WROTE IN HIS Thu 11/8/2007 6:32 AM POST BEN [referring the Vlads statement that about AIXIs uncomputability]Now now, it doesn't require infinite resources -- the AIXItl variant of AIXI only requires an insanely massive amount of resources, more than would be feasible in the physical universe, but not an infinite amount ;-) ED So, from a practical standpoint, which is all I really care about, is it a dead end? Also, do you, or anybody know, if Solmononoff (the only way I can remember the name is Soul man on off like Otis Redding with a microphone problem) Induction have the ability of deal with deep forms of non-literal similarity matching in is complexity calculations. And is so how? And if not, isnt it brain dead? And if it is a brain dead why is such a bright guy as Shane Legg spending his time on it. YAN KINK YIN IN HIS 11/8/2007 9:16 AM POST SAID YAN Is there any research that can tell us what kind of structures are better for machine learning? Or perhaps w.r.t a certain type of data? Are there learning structures that will somehow learn things faster? ED Yes, brain science. It may not point out the best possible architecture, but it points out one that works. Evolution is not theoretical, and not totally optimal, but it is practical. Systems like Novamente which is loosely based on many key ideas from brain science probably have a much more likely chance of getting useful stuff up and running soon that any more theortical approaches, because the search space has already been narrowed by many trillions of trials and errors over hundreds of millions of years. Ed Porter - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=62880190-75103d
RE: [agi] How valuable is Solmononoff Induction for real world AGI?
From: Jef Allbright [mailto:[EMAIL PROTECTED] I recently found this paper to contain some thinking worthwhile to the considerations in this thread. http://lcsd05.cs.tamu.edu/papers/veldhuizen.pdf This is an excellent paper not in only the subject of code reuse but also of the techniques and tools used to tackle such a complicated issue. Code reuse is related to code generation as some AGIs would make use of, or any other type of language generation, formal or whatever. John - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=62909084-8cc6c9
RE: [agi] How valuable is Solmononoff Induction for real world AGI?
Jef, The paper cited below is more relevant to Kolmogorov complexity than Solomonoff induction. I had thought about the use of subroutines before I wrote my questioning critique of Solomonoff Induction. Nothing in it seems to deal with the fact that the descriptive length of realitys computations that create an event (the descriptive length that is more likely to affect the events probability), is not necessarily correlated with the descriptive length of sensations we receive from such events. Nor is it clear that it deals with the fact that much of the frequency data a world-sensing brain derives its probabilities from is full of non-literal similarity, meaning that non-literal matching is a key component of any capable AGI. It does not indicate how the complexity of that non-literal matching, at the sensation, rather than the reality generating level, is to be dealt which by Solomonoff Indicution, is it part of the complexity involved in its hypothesis (or semi-measurs) or not, and to what if any extent should it be? With regard to the paper you cited I disagree with its statement that the measure of the complexity of a program written using a library should be the size of the program and the size of the library is uses. Presumably this was a mis-statement, because it would make all but the very largest programs that used the same vast library relatively close in size, regardless of the relative complexity of what they do. I assume it really should be the length of the program plus only each of the library routines it actually uses, independent of how many times it uses them. Anything else would mean that To make this discussion relevant to practical AGI, lets assume the program from which Kolmogorov complexity is computed is a Novamente-class machine up and running with world knowledge in say five to ten years. Assume the system has compositional and generalizational hierarchies providing it with the representational efficiencies Jeff Hawkins describes for hierarchical memory. In such a system much of what determines what happens lies in its knowledge base, I assume the length of any knowledge base components used would also have to be counted in the Kolmogorov complexity. But would one only count the knowledge structures actually found to match, or also the ones that were match candidates, but lost out, when calculating such complexity? Any ideas? Ed Porter -Original Message- From: Jef Allbright [mailto:[EMAIL PROTECTED] Sent: Thursday, November 08, 2007 9:56 AM To: agi@v2.listbox.com Subject: Re: [agi] How valuable is Solmononoff Induction for real world AGI? I recently found this paper to contain some thinking worthwhile to the considerations in this thread. http://lcsd05.cs.tamu.edu/papers/veldhuizen.pdf - Jef - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?; - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=62919265-6d3337
Re: [agi] How valuable is Solmononoff Induction for real world AGI?
On 08/11/2007, YKY (Yan King Yin) [EMAIL PROTECTED] wrote: Thanks for the input. There's one perplexing theorem, in the paper about the algorithmic complexity of programming, that the language doesn't matter that much, ie, the algorithmic complexity of a program in different languages only differ by a constant. I've heard something similar about the choice of Turing machines only affect the Kolmogorov complexity by a constant. (I'll check out the proof of this one later.) This only works if the languages are are Turing Complete so that they can append a description of a program that converts from the language in question to its native one, in front of the non-native program. Also constant might not mean negligable. 2^^^9 is a constant (where ^ is knuth's up arrow notation). But it seems to suggest that the choice of the AGI's KR doesn't matter. It can be logic, neural network, or java? That's kind of a strange conclusion... Only some neural networks are Turing complete. First order logic should be, prepositional logic not so much. Will Pearson - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=62912284-88dadd
Re: [agi] How valuable is Solmononoff Induction for real world AGI?
On 08/11/2007, Jef Allbright [EMAIL PROTECTED] wrote: I'm sorry I'm not going to be able to provide much illumination for you at this time. Just the few sentences of yours quoted above, while of a level of comprehension equal or better than average on this list, demonstrate epistemological incoherence to the extent I would hardly know where to begin. This discussion reminds me of hot rod enthusiasts arguing passionately about how to build the best racing car, while denigrating any discussion of entropy as outside the practical. You are over stating the case majorly. Entropy can be used to make predictions about chemical reactions and help design systems. UAI so far has yet to prove its usefulness. It is just a mathematical formalism that is incomplete in a number of ways. 1) Doesn't treat computation as outputting to the environment, thus can have no concept of saving energy or avoiding inteference with other systems by avoiding computation. The lack of energy saving means it is not valid model for solving the problem of being a non-reversible intelligence in an energy poor environment (which humans are and most mobile robots will be). 2) It is based on Sequential Interaction Machines, rather than Multi-Stream Interaction Machines, which means it might lose out on expressiveness as talked about here. http://www.cs.brown.edu/people/pw/papers/bcj1.pdf It is the first step on an interesting path, but it is too divorced from what computation actually is, for me to consider it equivalent to the entropy of AI. Will Pearson - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=63155358-fb9c39
RE: [agi] How valuable is Solmononoff Induction for real world AGI?
Derek, Thank you. I think the list should be a place where people can debate and criticize ideas, but I think such poorly reasoned and insulting flames like Jefs are not helpful, particularly if they are driving potentially valuable contributors like you off the list. Luckily such flames are relatively rare. So I think we should criticize such flames when they do occur so there are fewer of them and try to have tougher skin so that when they occur we dont get too upset by them (and so that we dont ourselves get drawn into flame mode by tough but fair criticisms). I just re-read my email that sent Jef into such a tizzy. Although it was not hostile, it was not as tactful as it could have been. In my attempt to respond quickly I did not intended to attack him or his paper (other than one apparent mis-statement in it), but rather to say it didnt relate to the issue I was specifically interested in. I actually thought it was quite an interesting article. I wish in hindsight I had said so. At the end of the post that upset Jef I actually asked how the Kolmogorov complexity measure his paper discussed might be applied to a given type of AGI. That was my attempt to acknowledge the importance of what the paper dealt with. Lets just hope going forward most people can take fair attempts to debate, question, or attack their ideas without flipping out, and I guess we all should spend an extra 5% more time in our posts trying to be tactful. And lets just hope more people like you start contributing again. Ed Porter -Original Message- From: Derek Zahn [mailto:[EMAIL PROTECTED] Sent: Thursday, November 08, 2007 3:05 PM To: agi@v2.listbox.com Subject: RE: [agi] How valuable is Solmononoff Induction for real world AGI? Edward, For some reason, this list has become one of the most hostile and poisonous discussion forums around. I admire your determined effort to hold substantive conversations here, and hope you continue. Many of us have simply given up. _ This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/? http://v2.listbox.com/member/?; - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=63067563-8aefd0
RE: [agi] How valuable is Solmononoff Induction for real world AGI?
Jeff, In your below flame you spent much more energy conveying contempt than knowledge. Since I dont have time to respond to all of your attacks, let us, for example, just look at the last two: MY PRIOR POST ...affect the event's probability... JEFS PUT DOWN 1More coherently, you might restate this as ...reflect the event's likelihood... MY COMMENT At Dragon System, then one of the worlds leading speech recognition companies, I was repeatedly told by our in-house PhD in statistics that likelihood is the measure of a hypothesis matching, or being supported by, evidence. Dragon selected speech recognition word candidates based on the likelihood that the probability distribution of their model matched the acoustic evidence provided by an event, i.e., a spoken utterance. Similarly, if one is drawing balls from a bag that has a distribution of black and white balls, the color of a ball produced by a given random drawing from the bag has a probability based on the distribution in the bag. But one uses the likelihood function to calculate the probability distribution in the bag, from among multiple possible distribution hypothesis, by how well that distribution matches data produced by events. The creation of the event by external reality I was referring to is much more like the utterance of a word (and its observation) or the drawing of ball (and the observation of its color) from a bag than a determination of what probability distribution most likely created it. On the otherhand, trying to understand the complexity that gives rise to such an event would be more like using likelihoods. When I wrote the post you have flamed I was not worrying about trying to be exact in my memo, because I had no idea people on this list wasted their energy correcting such common inexact usages as switching probability and likelihood. But as it turns out in this case my selection of the word probability rather than likelihood seems to be totally correct. MY PRIOR POST ...the descriptive length of sensations we receive... JEFS PUT DOWN 2 Who is this we that receives sensations? Holy homunculus, Batman, seems we have a bit of qualia confusion thrown into the mix! MY COMMENT Again I did not know that I would be attacked for using such a common English usage as we on this list. Am I to assume that you, Jef, never use the words we or I because you are surrounded by friends so kind as to rudely say Holy homunculus, Batman every time you do. Or, just perhaps, are you a little more normal than that. In addition, the use of the word we or even I does not necessary imply a homunculus. I think most modern understanding of the brain indicates that human consciousness is most probably -- although richly interconnected -- a distributed computation that does not require a homunculus. I like and often use Bernard Baars Theater of Consciousness metaphor. But none of this means it is improper to use the words we or I when referring to ourselves or our consciousnesses. And I think one should be allowed to use the word sensation without being accused of qualia confusion. Jeff, do you ever use the word sensation, or would that be too confusing for you? So, Jeff, if Solomonoff induction is really a concept that can help me get a more coherent model of reality, I would really appreciate someone who had the understanding, intelligence, and friendliness to at least try in relatively simple words to give me pointers as to how and why it is so important, rather someone who picks apart every word I say with minute or often incorrect criticisms. A good example of they type of friendly effort I appreciate is in Lukasz Stafiniak 11/08/07 11:54 AM post to me, which I have not had time yet to fully understand, but which I greatly appreciate for its focused and helpful approach. Ed Porter -Original Message- From: Jef Allbright [mailto:[EMAIL PROTECTED] Sent: Thursday, November 08, 2007 12:55 PM To: agi@v2.listbox.com Subject: Re: [agi] How valuable is Solmononoff Induction for real world AGI? On 11/8/07, Edward W. Porter [EMAIL PROTECTED] wrote: Jef, The paper cited below is more relevant to Kolmogorov complexity than Solomonoff induction. I had thought about the use of subroutines before I wrote my questioning critique of Solomonoff Induction. Nothing in it seems to deal with the fact that the descriptive length of reality's computations that create an event (the descriptive length that is more likely to affect the event's probability), is not necessarily correlated with the descriptive length of sensations we receive from such events. Edward - I'm sorry I'm not going to be able to provide much illumination for you at this time. Just the few sentences of yours quoted above, while of a level of comprehension equal or better than average on this list, demonstrate epistemological incoherence to the extent I would hardly know where to begin. This discussion reminds
RE: [agi] How valuable is Solmononoff Induction for real world AGI?
Edward, For some reason, this list has become one of the most hostile and poisonous discussion forums around. I admire your determined effort to hold substantive conversations here, and hope you continue. Many of us have simply given up. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=63054363-e5048c
RE: [agi] How valuable is Solmononoff Induction for real world AGI?
Cool! -Original Message- From: Benjamin Goertzel [mailto:[EMAIL PROTECTED] Sent: Thursday, November 08, 2007 12:56 PM To: agi@v2.listbox.com Subject: Re: [agi] How valuable is Solmononoff Induction for real world AGI? Yeah, we use Occam's razor heuristics in Novamente, and they are commonly used throughout AI. For instance in evolutionary program learning one uses a parsimony pressure which automatically rates smaller program trees as more fit... ben On Nov 8, 2007 12:21 PM, Edward W. Porter [EMAIL PROTECTED] wrote: BEN However, the current form of AIXI-related math theory gives zero guidance regarding how to make a practical AGI. ED Legg's Solomonoff Induction paper did suggest some down and dirty hacks, such as Occam's razor. It woud seem a Novamente-class machine could do a quick backward chaining of preconditions and their probabilities to guestimate probabilities. That would be a rough function of a complexity measure. But actually it wold be something much better because it would be concerned not only with the complexity of elements and/or sub-events and their relationships but also so their probabilities and that of their relationships. Edward W. Porter Porter Associates 24 String Bridge S12 Exeter, NH 03833 (617) 494-1722 Fax (617) 494-1822 [EMAIL PROTECTED] -Original Message- From: Benjamin Goertzel [mailto:[EMAIL PROTECTED] Sent: Thursday, November 08, 2007 11:52 AM To: agi@v2.listbox.com Subject: Re: [agi] How valuable is Solmononoff Induction for real world AGI? BEN [referring the Vlad's statement that about AIXI's uncomputability]Now now, it doesn't require infinite resources -- the AIXItl variant of AIXI only requires an insanely massive amount of resources, more than would be feasible in the physical universe, but not an infinite amount ;-) ED So, from a practical standpoint, which is all I really care about, is it a dead end? Dead end would be too strong IMO, though others might disagree. However, the current form of AIXI-related math theory gives zero guidance regarding how to make a practical AGI. To get practical guidance out of that theory would require some additional, extremely profound math breakthroughs, radically different in character from the theory as it exists right now. This could happen. I'm not counting on it, and I've decided not to spend time working on it personally, as fascinating as the subject area is to me. Also, do you, or anybody know, if Solmononoff (the only way I can remember the name is Soul man on off like Otis Redding with a microphone problem) Induction have the ability of deal with deep forms of non-literal similarity matching in is complexity calculations. And is so how? And if not, isn't it brain dead? And if it is a brain dead why is such a bright guy as Shane Legg spending his time on it. Solomonoff induction is mentally all-powerful. But it requires infinitely much computational resources to achieve this ubermentality. -- Ben G _ This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/? http://v2.listbox.com/member/?; _ This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/? http://v2.listbox.com/member/?; _ This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/? http://v2.listbox.com/member/?; - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=63016185-a53e2e
Re: [agi] How valuable is Solmononoff Induction for real world AGI?
On 11/8/07, Edward W. Porter [EMAIL PROTECTED] wrote: Jeff, In your below flame you spent much more energy conveying contempt than knowledge. I'll readily apologize again for the ineffectiveness of my presentation, but I meant no contempt. Since I don't have time to respond to all of your attacks, Not attacks, but (overly) terse pointers to areas highlighting difficulty in understanding the problem due to difficulty framing the question. MY PRIOR POST ...affect the event's probability... JEF'S PUT DOWN 1More coherently, you might restate this as ...reflect the event's likelihood... I (ineffectively) tried to highlight a thread of epistemic confusion involving an abstract observer interacting with and learning from its environment. In your paragraph, I find it nearly impossible to find a valid base from which to suggest improvements. If I had acted more wisely, I would have tried first to establish common ground **outside** your statements and touched lightly and more constructively on one or two points. MY COMMENT At Dragon System, then one of the world's leading speech recognition companies, I was repeatedly told by our in-house PhD in statistics that likelihood is the measure of a hypothesis matching, or being supported by, evidence. Dragon selected speech recognition word candidates based on the likelihood that the probability distribution of their model matched the acoustic evidence provided by an event, i.e., a spoken utterance. If you said Dragon selected word candidates based on their probability distribution relative to the likelihood function supported by the evidence provided by acoustic events I'd be with you there. As it is, when you say based on the likelihood that the probability... it seems you are confusing the subjective with the objective and, for me, meaning goes out the door. MY PRIOR POST ...the descriptive length of sensations we receive... JEF'S PUT DOWN 2 Who is this we that receives sensations? Holy homunculus, Batman, seems we have a bit of qualia confusion thrown into the mix! MY COMMENT Again I did not know that I would be attacked for using such a common English usage as we on this list. Am I to assume that you, Jef, never use the words we or I because you are surrounded by friends so kind as to rudely say Holy homunculus, Batman every time you do. Well, I meant to impart a humorous tone, rather than to be rude, but again I offer my apology; I really should have known it wouldn't be effective. I highlighted this phrasing, not for the colloquial use of we, but because it again demonstrates epistemic confusion impeding comprehension of a machine intelligence interacting (and learning from) its environment. Too conceptualize any such system as receiving sensation as opposed to expressing sensation, for example, is wrong in systems-theoretic terms of stimulus, process, response. And this confusion, it seems to me, maps onto your expressed difficulty grasping the significance of Solomonoff induction. Or, just perhaps, are you a little more normal than that. In addition, the use of the word we or even I does not necessary imply a homunculus. I think most modern understanding of the brain indicates that human consciousness is most probably -- although richly interconnected -- a distributed computation that does not require a homunculus. I like and often use Bernard Baars' Theater of Consciousness metaphor. Yikes! Well, that goes to my point. Any kind of Cartesian theater in the mind, silent audience and all -- never mind the experimental evidence for gaps, distortions, fabrications, confabulations in the story putatively shown -- has no functional purpose. In systems-theoretical terms, this would entail an additional processing step of extracting relevant information from the essentially whole content of the theater which is not only unnecessary but intractable. The system interacts with 'reality' without the need to interpret it. But none of this means it is improper to use the words we or I when referring to ourselves or our consciousnesses. I'm sincerely sorry to offend you. It takes even more time to attempt to repair, it impairs future relations, and clearly it didn't convey any useful understanding -- evidenced by your perception that I was criticizing your use of English. And I think one should be allowed to use the word sensation without being accused of qualia confusion. Jeff, do you ever use the word sensation, or would that be too confusing for you? Sensation is a perfectly good word and concept. My point is that sensation is never received by any system, that it smacks of qualia confusion, and that such a misconception gets in the way of understanding how a machine intelligence might deal with sensation in practice. So, Jeff, if Solomonoff induction is really a concept that can help me get a more coherent model of reality, I would really appreciate someone who had the understanding,
Re: [agi] How valuable is Solmononoff Induction for real world AGI?
On 11/8/07, Edward W. Porter [EMAIL PROTECTED] wrote: In my attempt to respond quickly I did not intended to attack him or his paper Edward - I never thought you were attacking me. I certainly did attack some of your statements, but I never attacked you. It's not my paper, just one that I recommended to the group as relevant and worthwhile. - Jef - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=63100811-68e37d
Re: [agi] How valuable is Solmononoff Induction for real world AGI?
On 11/8/07, Edward W. Porter [EMAIL PROTECTED] wrote: HOW VALUABLE IS SOLMONONOFF INDUCTION FOR REAL WORLD AGI? I will use the opportunity to advertise my equation extraction of the Marcus Hutter UAI book. And there is a section at the end about Juergen Schmidhuber's ideas, from the older AGI'06 book. (Sorry biblio not generated yet.) http://www.ii.uni.wroc.pl/~lukstafi/pmwiki/uploads/AGI/UAI.pdf - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=63153472-64b600
RE: [agi] How valuable is Solmononoff Induction for real world AGI?
memory of such states. A significant percent of these viewer can all respond at once in their own way to what is in the spot light of the consciousness, and there is a mechanism for rapidly switching the spotlight in response to audience reactions, reactions which can include millions of dynamic dimension. With regard to your statement that The system interacts with 'reality' without the need to interpret it. That sounds even more mind-denying than Skinners Behaviorism. At least Skinner showed enough respect for the mind to honor it with a black box. I guess we are to believe that perception, cognition, planning, and understanding happen without any interpretation. They are all just direct look up. Even Kolmogorov and Solomonoff at least accord it the honor of multiple program, and ones that can be quite complex at that, complex enough to even do interpretation. Ed Porter Edward W. Porter Porter Associates 24 String Bridge S12 Exeter, NH 03833 (617) 494-1722 Fax (617) 494-1822 [EMAIL PROTECTED] -Original Message- From: Jef Allbright [mailto:[EMAIL PROTECTED] Sent: Thursday, November 08, 2007 4:22 PM To: agi@v2.listbox.com Subject: Re: [agi] How valuable is Solmononoff Induction for real world AGI? On 11/8/07, Edward W. Porter [EMAIL PROTECTED] wrote: Jeff, In your below flame you spent much more energy conveying contempt than knowledge. I'll readily apologize again for the ineffectiveness of my presentation, but I meant no contempt. Since I don't have time to respond to all of your attacks, Not attacks, but (overly) terse pointers to areas highlighting difficulty in understanding the problem due to difficulty framing the question. MY PRIOR POST ...affect the event's probability... JEF'S PUT DOWN 1More coherently, you might restate this as ...reflect the event's likelihood... I (ineffectively) tried to highlight a thread of epistemic confusion involving an abstract observer interacting with and learning from its environment. In your paragraph, I find it nearly impossible to find a valid base from which to suggest improvements. If I had acted more wisely, I would have tried first to establish common ground **outside** your statements and touched lightly and more constructively on one or two points. MY COMMENT At Dragon System, then one of the world's leading speech recognition companies, I was repeatedly told by our in-house PhD in statistics that likelihood is the measure of a hypothesis matching, or being supported by, evidence. Dragon selected speech recognition word candidates based on the likelihood that the probability distribution of their model matched the acoustic evidence provided by an event, i.e., a spoken utterance. If you said Dragon selected word candidates based on their probability distribution relative to the likelihood function supported by the evidence provided by acoustic events I'd be with you there. As it is, when you say based on the likelihood that the probability... it seems you are confusing the subjective with the objective and, for me, meaning goes out the door. MY PRIOR POST ...the descriptive length of sensations we receive... JEF'S PUT DOWN 2 Who is this we that receives sensations? Holy homunculus, Batman, seems we have a bit of qualia confusion thrown into the mix! MY COMMENT Again I did not know that I would be attacked for using such a common English usage as we on this list. Am I to assume that you, Jef, never use the words we or I because you are surrounded by friends so kind as to rudely say Holy homunculus, Batman every time you do. Well, I meant to impart a humorous tone, rather than to be rude, but again I offer my apology; I really should have known it wouldn't be effective. I highlighted this phrasing, not for the colloquial use of we, but because it again demonstrates epistemic confusion impeding comprehension of a machine intelligence interacting (and learning from) its environment. Too conceptualize any such system as receiving sensation as opposed to expressing sensation, for example, is wrong in systems-theoretic terms of stimulus, process, response. And this confusion, it seems to me, maps onto your expressed difficulty grasping the significance of Solomonoff induction. Or, just perhaps, are you a little more normal than that. In addition, the use of the word we or even I does not necessary imply a homunculus. I think most modern understanding of the brain indicates that human consciousness is most probably -- although richly interconnected -- a distributed computation that does not require a homunculus. I like and often use Bernard Baars' Theater of Consciousness metaphor. Yikes! Well, that goes to my point. Any kind of Cartesian theater in the mind, silent audience and all -- never mind the experimental evidence for gaps, distortions, fabrications, confabulations in the story putatively shown -- has no functional purpose. In systems-theoretical terms, this would
RE: [agi] How valuable is Solmononoff Induction for real world AGI?
Lukasz Stafiniak wrote in part on Thu 11/8/2007 11:54 AM LUKASZ ## I think the main point is: Bayesian reasoning is about conditional distributions, and Solomonoff / Hutter's work is about conditional complexities. (Although directly taking conditional Kolmogorov complexity didn't work, there is a paragraph about this in Hutter's book.) ED ## what is the value or advantage of conditional complexities relative to conditional probabilities? When you build a posterior over TMs from all that vision data using the universal prior, you are looking for the simplest cause, you get the probability of similar things, because similar things can be simply transformed into the thing under question, moreover you get it summed with the probability of things that are similar in the induced model space. ED ## Whats a TM? Also are you saying that the system would develop programs for matching patterns, and then patterns for modifying those patterns, etc, So that similar patterns would be matched by programs that called a routine for a common pattern, but then other patterns to modify them to fit different perceptions? So are the programs just used for computing Kolmogorov complexity or are they also used for generating and matching patterns. Does it require that the programs exactly match a current pattern being received, or does it know when a match is good enough that it can be relied upon as having some significance? Can the programs learn that similar but different patterns are different views of the same thing? Can they learn a generalizational and compositional hierarchy of patterns? Can they run on massively parallel processing. The Hutters expectimax tree appears to alternate levels of selection and evaluation. Can the Expectimax tree run in reverse and in parallel, with information coming up from low sensory levels, and then being selected based on their relative probability, and then having the selected lower level patterns being fed as inputs into higher level patterns and then repeating that process. That would be a hierarchy that alternates matching and then selecting the best scoring match at alternate levels of the hierarchy as is shown in the Serre article I have cited so many times before on this list. LUKASZ ## You scared me... Check again, it's like in Solomon the king. ED## Thanks for the correction. After I sent the email I realized the mistake, but I was too stupid to parse it as Solomon-(the King)-off. I was stuck in thinking of Sol-om-on-on-off, which is hard to remember and that is why I confused it. Solomon-(the King)-off is much easier to remember. I have always been really bad at names, foreign languages, and particularly spelling. LUKASZ ## Yes, it is all about non-literal similarity matching, like you said in later post, finding a library that makes for very short codes for a class of similar things. ED## are these short codes sort of like Wolfram little codelettes, that can hopefully represent complex patterns out of very little code, or do they pretty much represent subsets of visual patterns as small bit maps. Ed Porter -Original Message- From: Lukasz Stafiniak [mailto:[EMAIL PROTECTED] Sent: Thursday, November 08, 2007 11:54 AM To: agi@v2.listbox.com Subject: Re: [agi] How valuable is Solmononoff Induction for real world AGI? On 11/8/07, Edward W. Porter [EMAIL PROTECTED] wrote: VLADIMIR NESOV IN HIS 11/07/07 10:54 PM POST SAID VLADIMIR Hutter shows that prior can be selected rather VLADIMIR arbitrarily without giving up too much BTW: There is a point in Hutter's book that I don't fully understand: the belief contamination theorem. Is the contamination reintroduced at each cycle in this theorem? (The only way it makes sense.) (However, I have read that for complex probability distributions the choice of the class of mathematical model you use to model the distribution is part of the prior choosing issue, and can be important but that did not seem to be addressed in the Solomonoff Induction paper. For example in some speech recognition each of the each speech frame model has a pre-selected number of dimensions, such as FFT bins (or related signal processing derivatives), and each dimension is not represented by a Gausian but rather by a basis function comprised of a set of a selected number of Gausians.) Yes. The choice of Solomonoff and Hutter is to take a distribution over all computable things. It seems to me that when you don't have much frequency data, we humans normally make a guess based on the probability of similar things, as suggested in the Kemp paper I cited.It seems to me that is by far the most commonsensical approach. In fact, due to the virtual omnipreseance of non-literal similarity in everything we see and hear, (e.g., the same face virtually never hits V1 exactly the same) most of our probabilistic thinking is dominated by similarity derived probabilities. I think