I think processor to memory, and inter processor communications are currently far short
-----Original Message----- From: Matt Mahoney [mailto:[EMAIL PROTECTED] Sent: Thursday, June 12, 2008 12:33 PM To: [email protected] Subject: RE: [agi] IBM, Los Alamos scientists claim fastest computer > Matt Mahoney >##########>> I think the ratio of processing power to memory to bandwidth is just about right for AGI. Ed Porter >##########>> I tend to think otherwise. I think the current processor-to-RAM and processor-to-processor bandwidths are too low. (PLEASE CORRECT ME IF YOU THINK ANY OF MY BELOW CALCULATIONS OR STATEMENTS ARE INCORRECT) The average synapse fires over once per second on average. The brain has roughly 10^12 - 10^15 synapses (the lower figure is based on some peoples' claim that only 1% of synapses are really effective). Since each synapse activation involve at least two or more memory accesses (at least a read-modify-write) that would involve roughly a similar number of memory accesses per second. Because of the high degree of irregularity and non-locality of connections in the brain, many of such accesses would have to be modeled by non-sequential RAM accesses. Since --- as is stated below in more detail --- a current processor can only average roughly about 10^7 non-sequential read-modify-writes per second, that means 10^5 - 10^8 processors would be required just to access RAM at the same rate the brain accesses memory at its synapses, with 10^5 probably being a low number. But a significant number of the equivalent of synapse activations would require inter-processor communication in an AGI made out of current computer hardware. If one has only on the order of 10^5 processors, load balancing becomes an issue. And to minimize this you actually want a fair amount of non-locality of memory. (For example, when they put Shastri's Shruiti cognitive architecture on a Thinking Machine, they purposely randomized the distribution of data across the machines memory to promote load balancing.) (Load balancing is not an issue in the brain, since the brain has the equivalent a simple, but parallel, processor for its equivalent of roughly every 100 to 10K synapses.) Thus, you are probably talking in terms of needing to be able to send something in the rough ball park of 10^9 to 10^12 short, inter-processor messages a second. To do this without having congestion problems, you are probably going to need a theoretical bandwidth 5 to 10 times that. One piece of hardware that would be a great machine to run test AGI's on is the roughly $60M TACC Ranger supercomputer in Austin, TX. It includes 15,700 AMD quadcores, for over 63K cores, and about 100TB of RAM. Most importantly it has Sun's very powerful Constellation system switch with 3456 (an easy to remember number) , 20Gbit infiniband bi-directional ports, which is a theoretical x-secontional bandwidth of roughly 6.9TByte/sec. If the average spreading activation message were 32bytes, and if they were packed into larger blocks to reduce per/msg costs, if and, and you assumed roughly only 10 percent of the total capacity was used on average to prevent congestion, that would allow roughly 20 billion global messages a second, with each of the 3456 roughly quad core nodes receiving about 5 million per second. (If any body has any info on how many random memory accesses a quad-processor quad-core node can do/sec, I would be very interested --- I am guessing between 80 to 320 million/sec) I would not be surprised if the Ranger's inter-processor and processor-to-RAM bandwidth is one or two orders of magnitude too low for many types of human level thinking, but it would certainly be enough to do very valuable AGI research, and to build powerful intelligences that would be in many ways more powerful than human. > Matt Mahoney >##########>> Processing power and memory increase at about the same rate under Moore's Law. Ed Porter >##########>> Yes, but the frequency of non-sequential processor-to-memory accesses has increased much more slowly. ( This may change in the future with the development of the type of massively multi-core chips, with built in high bandwidth mesh networks, with, say, 10 RAM layers over each processor, in which the layers of each such chip are connected with through silicon vias that Sam Adams says he is now working on. Hopefully, each such multi-layer chips will be connected with hundreds of high bandwidth communication channels, could help change this. So also could processor-in-memory chips.) > Matt Mahoney >##########>>The time it takes a modern computer to clear all of its memory is on the same order as the response time as a neuron, and this has not changed much since ENIAC and the Commodore 64. It would seem easier to increase processing density than memory density but we are constrained by power consumption, heat dissipation, network bandwidth, and the lack of software and algorithms for parallel computation. Bandwidth is about right too. A modern PC can simulate about 1 mm^3 of brain tissue with 10^9 synapses at 0.1 ms resolution or so. Ed Porter >##########>> I think a PC is too slow to do the modeling you say. This is because of processor-to-RAM latencies. As of 1999 a 730 Mhz PC could only non-sequentially access RAM at about 10mhz. Today, I think the latencies are still at least 30ns, and the transfer of a 64 byte cache line on a 64 bit wide bus would take 8 clock edges, or four 2.5ns 400mHz bus clock cycles, for another 10ns. So the limit would probably still be under 25M random accesses per second today. There are other delays, such as the time required (1) to find if the desired memory is in L1 and then L2 cache, (2) to do a virtual address translation, and (3) to pass the memory request and then the returned data through the chip's bus interface --- which I have not included. Multi-core or hyperthreading, might produce some improvement by overlapping the latencies, but at best you would not be acceding more than 40M random memory access per sec. (If any one has hard figures on these number's please tell me.) Since you are assuming modeling 10^9 synapse, and since the connection are highly irregular, that means you would be doing most of you synapse updating as non-sequential RAM accesses. Since such an update would require at least a read and a write, you could only do roughly 1 to 2 x 10^7 per second. This is optimistic and it is close to two orders of magnitude slower than what you have said above. > Matt Mahoney >##########>> Nerve fibers have a diameter around 1 or 2 microns, so a 1 mm cube would have about 10^6 of these transmitting 10 bits per second, or 10 Mb/s. Ed Porter >##########>> I will assume your calculation is correct. But since the cortex is roughly 1600 cm^2 in area, there would be roughly 160,000 of these 1mm^3 cubes (and corresponding PC's in your calculation) each with your above state 10Mb/s of bandwidth, for a total inter-cube bandwidth of 1.6 x 10^12b/s. It should be noted that since each cube would be attached to many other cubes, the bandwidth required if such interconnect were to be modeled by a simple mesh would be even higher. But as I have said above, you would really need about 50 to 100 interconnected pc's to have the memory accessing capability of the 1mm^3 cube. And you would need roughly similar density of interconnect between them, arguably multiplying the total inter-processor interconnect fifty fold, to roughly 8 to 16 x 10^13 b/s Plus, it must be remembered that much of the information contained in a synapse or nerve fiber is contained by the location and strength and type (such as neuro-transmitter) of its connections. This is the equivalent to probably at least a 4 or 8 byte address associated with every message. And at least a byte or so associated with strengths and connection type. If you have one message a second that is an extra 40 or 80 bits added to the 10 bit/sec per nerve fiber mentioned above. > Matt Mahoney >##########>> Similar calculations for larger cubes show locality with bandwidth growing at O(n^2/3). This could be handled by an Ethernet cluster with a high speed core using off the shelf hardware. Ed Porter >##########>> But as stated above the brain has very fined grain processing power, and a high ratio of processing units to memory, so it can take advantage of locality of memory without having the load balancing problems most current computer hardware equivalents would have, so these locality of reference calculations would probably have to be altered accordingly. Matt Mahoney >##########>> I don't know if it is coincidence that these 3 technologies are in the right ratio, or if it driven by the needs of software that compliment the human mind. -- Matt Mahoney, [EMAIL PROTECTED] Ed Porter >##########>> NET-NET: There are possible savings in terms of the amount of hardware that might be required to emulate human level though in computer hardware, that might result from certain ways in which computer hardware greatly outperforms wetware --- such as much more crisp memory, the ability to perform long sequences of operations with much greater exactness, etc., a greater ability to rapidly load and save memory, and to rapidly change function through programability. But except for the possibilities of such savings --- it would appears the amount of current style computer hardware required to produced human level though is substantially greater than your above estimate indicates. The human brain has been brilliantly designed by evolution to perform massive spreading activation computing in an extremely efficient manner. --- On Thu, 6/12/08, Derek Zahn <[EMAIL PROTECTED]> wrote: From: Derek Zahn <[EMAIL PROTECTED]> Subject: RE: [agi] IBM, Los Alamos scientists claim fastest computer To: [email protected] Date: Thursday, June 12, 2008, 11:36 AM Two things I think are interesting about these trends in high-performance commodity hardware: 1) The "flops/bit" ratio (processing power vs memory) is skyrocketing. The move to parallel architectures makes the number of high-level "operations" per transistor go up, but bits of memory per transistor in large memory circuits doesn't go up. The old "bit per op/s" or "byte per op/s" rules of thumb get really broken on things like Tesla (0.03 bit/flops). Of course we don't know the ratio needed for de novo AGI or brain modeling, but the assumptions about processing vs memory certainly seem to be changing. 2) Much more than previously, effective utilization of processor operations requires incredibly high locality (processing cores only have immediate access to very small memories). This is also referred to as "arithmetic intensity". This of course is because parallelism causes "operations per second" to expand much faster than methods for increasing memory bandwidth to large banks. Perhaps future 3D layering techniques will help with this problem, but for now AGI paradigms hoping to cache in (yuk yuk) on these hyperincreases in FLOPS need to be geared to high arithmetic intensity. Interestingly (to me), these two things both imply to me that we get to increase the complexity of neuron and synapse models beyond the "muladd/synapse + simple activation function" model with essentially no degradation in performance since the bandwidth of propagating values between neurons is the bottleneck much more than local processing inside the neuron model. ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?& Powered by Listbox: http://www.listbox.com ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244&id_secret=103754539-40ed26 Powered by Listbox: http://www.listbox.com
<<attachment: winmail.dat>>
