--- "J. Andrew Rogers" <[EMAIL PROTECTED]> wrote: > > On Apr 30, 2008, at 11:41 AM, Matt Mahoney wrote: > > By distributing the problem across the internet. AGI can be > divided > > into lots of specialized experts and a network for getting messages > to > > the right experts. http://www.mattmahoney.net/agi.html > > > There are a few problems with your model that need to be fixed before > it is legitimately viable, though you do acknowledge some of them in > the paper: > > 1.) The protocol design is naive and will not scale up to the level > you think it will, simplifying away by assumption topology > characteristics where deviations from the assumption will have a > major > impact. There are no general, computable solution to the underlying > issues (neither in literature nor in unpublished research that I know > of), and you gloss over or do not consider problems that would have a > pathological expression if you actually tried to build it. This is > an important and active area of mathematics research in a couple > different fields.
Which protocol are you referring to, the one described on my web page or the abstract one described in my thesis? In the abstract one I described a network of n identical (but unreliable) peers, each connected to c = O(log n) peers, using a vector space model with messages uniformly distributed in d dimensions. In simulations it scales to large n but does poorly when d > c. This would seem to preclude text indexing where d ~ 10^5. However, the simulation is worst case. In practice, text tends to cluster in a vector space, effectively reducing the number of dimensions to a few hundred. Also, I expect computing resources to be distributed unevenly; a few big nodes with large c and lots of smaller ones with small c. An example would be connecting Google as a peer with c ~ 10^9. What missing assumptions would break it? > 2.) There is nothing in published literature that will do the kind > of > indexing you want to do in the spatial domain, but it is possible in > theory. Yes, that was my main challenge. My motivation in 1997 was the lack of decentralized search capability for P2P networks. Then I forgot about it while I investigated language modeling for AI and started my dissertation on it (and then changing the topic in order to get funded). Only last year I saw the application of distributed indexing to AGI. > For your purposes in the broadest sense, things like kD-trees > will drop dead for pretty trivial systems, never mind for something > ambitious. On the other hand, generalized distribution with O(n) > storage complexity was solved last year which may or may not address > your issues. Storage is O(n) using an organizational tree, but that would not be robust. I think the O(log n) factor is not a big penalty. You want to have more pointers per peer as the network grows, and you want to have more cached and backup copies of messages. Anyway I agree the protocol design needs more work before it is ready to deploy. I expect there will be many problems we didn't anticipate. I am in no hurry. -- Matt Mahoney, [EMAIL PROTECTED] ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244&id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
