On Sun, May 15, 2005 at 02:43:57PM -0700, Mike Stump wrote: > On Sunday, May 15, 2005, at 01:01 PM, Luke Kenneth Casson Leighton > wrote: > >unfortunately, integration of aspex's proprietary tool-chain - written > >in modula-2 - is extremely unlikely to ever be integrated into gcc. > > Right. But the ideas could be. The ideas in some respects are more > important than the code.
okay. the key architectural things about the ASP processor are as follows: * per-APE (qty 4096) 128 bits of content-addressable memory registers * per-APE (qty 4096) 256 bits of memory-registers * a 2-bit pipelined ALU * every APE is connected to its neighbour (left and right). * the APE string can be subdivided into 16-long segments at ARBITRARY boundaries, and also cyclically looped back the key thing about the instruction set is as follows: * you can "tag" certain APEs such that only the "tagged" APEs will execute the next instruction. see below for a bool example involving valarray. * an 8-bit 16-bit or 32-bit "compare", in the CAM memory, in one instruction cycle. this is _highly_ significant for data recognition: it's the one part of the ASP that _doesn't_ go at "bit-level" speed. obviously, the compare needs to be on an 8-bit, 16-bit or 32-bit boundary in the 128-bits of CAM. you _can't_ do 8-bit, 16-bit or 32-bit compares in the 256 bits of memory-registers. it's not CAM - it's ordinary memory cells. * you can shuffle bits left and right down the APE neighbour communications bus. if the "cyclic loop" is enabled, bits dropping out the end of a segment come back to the other end, whereever that end has been programatically set. it's equivalent to the valarray "shift" and "cshift" functions, but not quite - because you can "break" the string into arbitrary lengths. ... quite an interesting data logistics problem, there :) * an instruction that checks, down the length of a string of APEs, where a particular bit in a register is set, and where that bit ends, with the results ending up in _two_ "tag" registers bit 5 in the APEs: 0 0 1 1 1 1 1 0 0 1 1 0 0 0 0 1 1 1 1 results in: tag 1 register 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 tag 2 register 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 this is typically used to implement the "carry" of a parallelised add instruction. each APE is utilised to perform an 8-bit add, over 8 instruction cycles, and then the 9th instruction is one of these "carry" instructions, and the 10th instruction the tag2 register is shuffled right one APE, and then added as a "carry" bit. * an instruction that checks whether all bits are set in a particular APE: bit 5 in all APEs: 1 1 1 1 1 1 1 1 1 1 0 1 1 1 tag 1 register : 0 0 0 0 0 0 0 0 0 0 0 0 0 0 bit 5 in all APEs: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 tag 1 register : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 this can also, iirc, be "qualified" by tagging.... hm, not sure... it's been a long time, but if it _does_ then it would go like this: bit 5 in all APEs: 1 1 1 0 1 0 1 1 1 1 0 1 1 1 tag 1 register : 1 1 0 0 1 0 1 1 1 1 1 1 1 1 tag 2 register : 0 0 0 0 0 0 0 0 0 0 0 0 0 0 bit 5 in all APEs: 1 1 1 0 1 0 1 1 1 1 1 1 1 1 tag 1 register : 1 1 0 0 1 0 1 1 1 1 1 1 1 1 tag 2 register : 1 1 0 0 1 0 1 1 1 1 1 1 1 1 i.e the tag1 register tells you "we don't give a stuff". * likewise an instruction that checks whether all bits are clear. from this, it should be _very_ obvious that valarray is _highly_ suited to hardware acceleration by an ASP: valarray<bool> b(20); valarray<int> x(20); valarray<int> y(20); ..... obtain x data... ..... obtain y data... b = (x != 19); x[b] += y[b]; i.e. _only_ in those elements where b is not equal to 19, add the corresponding array element of y. this is _exactly_ the sort of thing where APE "tagging" allows an instruction to be conditionally performed - in parallel. the only thing that definitely _doesn't_ exist conceptually in valarray is that "carry-equivalent" function. it's just not something that people other than Aspex have ever thought about, because of course nobody in their right minds does bit-level programming any more :) l.