Re: [agi] WHAT SORT OF HARDWARE $33K AND $850K BUYS TODAY FOR USE IN AGI

Richard Loosemore Fri, 27 Jun 2008 13:40:16 -0700

At a quick glance I would say you could do it cheaper by building ityourself rather than buying Dell servers (cf MicroWulf project that wasdiscussed before: http://www.clustermonkey.net//content/view/211/33/).

Secondly: if what you need to get done is spreading activation (whichimplies massive parallelism) you would probably be better off with aCeloxica system than COTS servers: celoxica.com. Hugo de Garis has agood deal of experience with using this hardware: it is FPGA based, sothe potential parallelism is huge.

Third: the problem, in any case, is not the hardware. AI researchershave saying "if only we had better hardware, we could really get thesealgorithms to sing, and THEN we will have a real AI!" since the f***ing1970s, at least. There is nothing on this earth more stupid thanwatching people repeat the same mistakes over and over again, fordecades in a row.

Pardon my fury, but the problem is understanding HOW TO DO IT, and HOWTO BUILD THE TOOLS TO DO IT, not having expensive hardware. So long assome people on this list repeat this mistake, this list will degenerateeven further into obsolescence.


Frankly, looking at recent posts, I think this list is already dead.




Richard Loosemore






Ed Porter wrote:

WHAT SORT OF HARDWARE $33K AND $850K BUYS TODAY FOR USE IN AGI

On Wednesday, June 25, US East Cost time, I had an interesting phone
conversation with Dave Hart, where we discussed just how much hardware could
you get for the current buck, for the amounts of money AGI research teams
using OpenCog (THE LUCKY ONES) might have available to them.

After our talk I checked out the cost of current servers at Dell (the
easiest place I knew of to check out prices.) I found hardware, and
particularly memory was somewhat cheaper than Dave and I had thought.  But
it is still sufficiently expensive, that moderately funded projects are
going to be greatly limited by the processor-memory and inter-processor
bandwidth as to how much spreading activation and inferencing they will be
capable of doing.

A RACK MOUNTABLE SERVER WITH 4 QUAD-CORE XEONS, WITH EACH PROCESSOR HAVING
8MB OF CACHE, AND THE WHOLE SERVER HAVING 128GBYTES OF RAM AND FOUR 300GBYTE
HARD DRIVES WAS UNDER $30K.  The memory stayed roughly constant in price per
GByte going from 32 to 64 to 128 GBytes.  Of course you would probably have
to pay a several extra grand for software and warranties.  SO LET US SAY THE
PRICE IS $33K PER SERVER.

A 24 port 20Gbit/sec infiniband switch with cables and one 20Gbit/sec
adapter card for each of 24 servers would be about $52K

SO A TOTAL SYSTEM WITH 24 SERVERS, 96 PROCESSORS, 384 CORES, 768MBYTE OF L2
CACHE, 3 TBYTES OF RAM, AND 28.8TBYTES OF DISK, AND THE 24 PORT 20GBIT/SEC

SWITCH WOULD BE ROUGHLY $850 GRAND.

That doesn't include air conditioning.  I am guessing each server probably
draws about 400 watts, so 24 of them would be about 9600 watts--- about the
amount of heat of ten hair dryers running in one room, which obviously would
require some cooling, but I would not think would be that expensive to
handle.

With regard to performance, such systems are not even close to human brain
level but they should allow some interesting proofs of concepts

Performance
---------------------------------------
AI spreading activation often involves a fair amount of non-locality of
memory.  Unfortunately there is a real penalty for accessing RAM randomly.
Without overleaving, one article I read recently implied about 50ns was a
short latency for a memory access.  So we will assume 20M random RAM access
(randomRamOpps) per second per channel, and that an average activation will
take two, a read and write, so roughly 10M activations/sec per memory

channel.

Matt Mahoney has pointed out that spreading activation can be modeled by
matrix methods that let you access RAM with much higher sequential memory
accessing rates.  He claimed he could process about a gigabyte of matrix
data a second.  If one assumes each element in the matrix is 8 bytes, that
would be the equivalent of doing 125M activation a second, which is roughly
12.5 times faster (if just 2 bytes, it would be 50 times fasters, or 500M
activation/sec).

If one assumes each of 4 core of each of 4 processors could handle a matrix
at 1GByte/sec, and each element in the matrix was just 2 bytes, that would
be 8 G 2Byte matrix activations/sec/server, and 256G matrix
activation/sec/system.  It is not clear how well this could be made to work
with the type of interconnectivity of an AGI.  It is clear their would be
some penalty for sparseness, perhaps a large one.  If one used run-length
encoding in matrix, which is read by rows, then a set of column whose values
could fit in cache could be loaded into cache, and the portions of all the
rows relating to them could be read sequentially.  Once all the portions of
all the row relating to the sub-set of colums had been processed, then the
process could be repeated for another set of columns whose values would be
read into cache.   If this were done one should be able to get largely
sequential high memory sequential access rates, but presumably this would
substantially increase the number of bytes per contact required in the
matrix rep. and would slow processing.

But it is not clear how much the speed would be decreased as the sparseness
of the matrix increases.  It is also not clear how effective such method
would be in the presence of inference control mechanisms which would
presumably be filtering a high percent of messaging, and thus dynamically
varying the sparseness and varying its pattern.

Thus I think at least initially when we are exploring different types of
activation spreading and inferencing patterns thinking in terms of such
matrix systems would be highly constraining, and there should be a lot of
attention paid to the limit on AGI computer power which is caused by the
processor-memory, and inter-processor bandwidth limitations.

I am assuming below our system has the type of quadcore Xeons that Dave Hart
told me about Wednesday night which he said were coming out soon and which
would have 4 separate memory channels for each processor.  I have assumed in
my estimates below that each channel can do 20M random RAM accesses/sec
(randomRamOpps).  (You might get higher number of ram opps from interleaving
memory reads and writes on the same bus, but I don't know how much this can
speed up randomRamOpps/sec.  That would be 320M randomRamOpps/sec for each
of the 16 channels on one server, and 7.680G randomRamOpp/sec for the whole
system.  Divide those numbers in half for read-modify-writes to RAM.

If one assumes that L2 cache access require 7 clock cycles (which is what
they did in a P4), and if one assumes that each processor four cores could
access cache without any decrease in speed because of contention (which they
probably can't), then for 1.8Ghz Xeon, that would be a max of  (1.8gHz/7)x4
= ~ 1G L2 cache accesses/sec.  That would be an optimistic max of 4 G L2
cache accesses/sec for each server, and an optimistic max of 96G L2 cache
accesses/sec for the whole 24 server system.

If one assumes inter-server messages are 16byte messages, that are packed
into 16Kbyte infiniband packets, that would allow upto 1K messages a second
(obviously if spreading activation messages are 32bytes each, the number
would be half).  If two machines are just sending messages as fast as they
can between each other a 20Gbit/sec, they would allow each node to both send
and receive about 42.3M such 16Byte sub-messages/sec.  Assuming that with
the possible contention of 24 machines sending to each other, and the fact
that the desired message flow my not be regular, let us assume
optimistically that we can get 20% of this messaging capacity, or 8.4M such
16Byte sub-msgs/sec.  Over the 24 servers that would be an average of
roughly 200M inter node 16Byte sub-messages/sec.  This interprocessor
messaging rate is roughly 1/20th of the rate at which the system can perform
random read-modify-writes to RAM.  It is probable that at least several such
read-modify-writes will be involved in the sending and receiving of each
such sub-msg, and often a message from one graph node in one server will
activate multiple graph nodes in another server at which it is received, and
most of these activations will probably require random read-modify-writes to
RAM.  So the inter-server bandwidth in this system is probably roughly
balanced with the number of random RAM accesses each server can perform.

Below is a summary of these very rough, often quite optimistic, estimates of
the power of such a ~$33K server and the ~$850M 24 server system --- all
with the qualifications discussed above:  (SOME OF THESE ESTIMATES MAY BE 2
TO 5 TIMES TOO HIGH)


==================================================
---FOR ONE ROUGHLY $33K SERVER
-------- FOR $850 MILLION 24 NODE SYSTEM
==================================================
---4 quadcore processors, 16 cores

------ 96 QUADCORE PROCESSORS, 384 CORES---128GBytes RAM-------- 3TBYTES RAM---32MBytes of L2 cache

-------- 768MBYTES of L2 CACHE
---20Gbits/sec inter-server bandwidth
-------- 480GBITS/SEC INTER-SERVER BANDWIDTH
==================================================
---16GByte of matrix processing/sec
-------- 384 GBYTES OF MATRIX PROCESSING/SEC
---8G 2byte matrix elements processed/sec

-------- 192G 2BYTE MATRIX ELEMENTS PROCESSED/SEC---2G 8byte matrix elements processed/sec

-------- 48G 8BYTE MATRIX ELEMENTS PROCESSED/SEC
---4G L2 cache accesses/sec (if no contention between cores)
-------- 96G L2 CACHE ACCESSES/SEC (if no contention between cores)
---320M randomRamOpps/sec (cache line reads or writes)
-------- 7.6G RANDOMRAMOPPS/SEC (cache line reads or writes)
---160M random cache line read-modify-writes/sec
-------- 3.8G RANDOM CACHE LINE READ-MODIFY-WRITES/SEC
---8.4M 16Byte inter-sever sub-msg/sec(~1/20 of random r-m-writes)
-------- 200M 16BYTE INTER-SERVER SUB-MSG/SEC(~1/20 of random r-m-writes)
//one msg to another server could activate all connections to graph nodes in
that other server from the sending graph node, but the amount of such
message multiplication is limited by number of random cache line
read-modify-writes.
==================================================


The take-home from this is that for $33K you can get a machine with
128Gbytes of RAM and very something in the ballpark of 160M random cache
line read-modify-writes/sec.  This should be enough to demonstrate to those
enlightened enough to understand AI concepts the potential power of
promising AGI architectures.  $33K is cheap enough that hopefully in a year
of so 10s or 100s of grad student projects will be each working with one or
more such systems for AGI problems (hopefully, many with OpenCog).

For $850K you can get a 3 terabyte of RAM.  That should be roughly enough to
store as much information as the brain of a rat.  For example,
http://faculty.washington.edu/chudler/facts.html states the cerebral cortex
of a rat has 6cm2 area.  Since the cortex has roughly 10^5 neurons/mm2,
that's 6x10^7 neurons, and if you assume 10^4 synapses per neuron that's
6x10^11 synapse.  3Tbytes of RAM would allow an average of 5bytes per
synapse. Even is you doubled or tripled the number of neurons to reflect
neurons in other parts of the brain, the $850K 24 server system would have
more than one byte per synapse (which is probably too low by one or two
orders of magnitude, if you are not using a matrix representation, but in

the right ball park).

But when you get to processing and communicating power, however, the picture
is much more bleak.  If you assume each neuron fires on average one a
second, a number I have read in some papers, that is 6x10^11 synapse
activation/sec.  If you could do your activations using matrix speeds
indicated above, you might be able to be roughly in this ball park (but you
might well be slowed down by one or two orders of magnitude by things such

as the sparseness of interconnects and L2 cache access speeds).

But if you are using random accessing of RAM to do activations, you are only

going to be about 1/200th this assumed rat brain speed.

(It should be noted, however, that some people claim that only about 1% of
synapse are actually functional, which if true would indicate that even
using random accessing of RAM you should be able to roughly simulate a rat
brain.)

(And of course, an AGI program running on the 24 server system would
probably be dealing at a high level of abstraction than most of the
processing done in a rat's brain, such as at the word level, or at sensory
levels where a lot of the lower level inputs have been preprocessed by more
efficient matrix, or stream computing methods.)

Thus, current AGI projects, are going to be limited by the amount of
spreading activation and inferencing you will be able to do, with the types
of hardware likely to be funded by typical academic projects.  Once the
hardware industry starts selling hardwares with much greater
processor-memory and inter-processor bandwidth --- such as the 64 to 256
core chips, with the cores connected by a high bandwidth mesh network, and
with each core connected by through silicon vias to multiple memory layers
above, providing, fat buses between RAM and each core --- AGI's will be able
to demonstrate much greater capability for a given amount of RAM.

But for those who can get enough funding to get systems like the $33K server
upto the $850K 24 server system, these systems should be powerful enough to
provide good testbeds for many AGI ideas.

Ed Porter






-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: http://www.listbox.com/member/?&;
Powered by Listbox: http://www.listbox.com




-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=106510220-47b225
Powered by Listbox: http://www.listbox.com

Re: [agi] WHAT SORT OF HARDWARE $33K AND $850K BUYS TODAY FOR USE IN AGI

Reply via email to