Re: [lkcl@lkcl.net: has gcc been reworked so that code/templates can be "outsourced" e.g. to perl yet?]

Luke Kenneth Casson Leighton Mon, 16 May 2005 02:58:18 -0700

On Sun, May 15, 2005 at 02:43:57PM -0700, Mike Stump wrote:
> On Sunday, May 15, 2005, at 01:01  PM, Luke Kenneth Casson Leighton 
> wrote:
> >unfortunately, integration of aspex's proprietary tool-chain - written
> >in modula-2 - is extremely unlikely to ever be integrated into gcc.
> 
> Right.  But the ideas could be.  The ideas in some respects are more 
> important than the code.


 okay.

 the key architectural things about the ASP processor are as follows:

 * per-APE (qty 4096) 128 bits of content-addressable memory registers
 * per-APE (qty 4096) 256 bits of memory-registers
 * a 2-bit pipelined ALU
 * every APE is connected to its neighbour (left and right).
 * the APE string can be subdivided into 16-long segments at
   ARBITRARY boundaries, and also cyclically looped back
 

 the key thing about the instruction set is as follows:

 * you can "tag" certain APEs such that only the "tagged" APEs will
   execute the next instruction.

   see below for a bool example involving valarray.

 * an 8-bit 16-bit or 32-bit "compare", in the CAM memory, in one
   instruction cycle.
   
   this is _highly_ significant for data recognition: it's the
   one part of the ASP that _doesn't_ go at "bit-level" speed.

   obviously, the compare needs to be on an 8-bit, 16-bit or 32-bit
   boundary in the 128-bits of CAM.

   you _can't_ do 8-bit, 16-bit or 32-bit compares in the 256
   bits of memory-registers.  it's not CAM - it's ordinary memory
   cells.

 * you can shuffle bits left and right down the APE neighbour
   communications bus.  if the "cyclic loop" is enabled, bits
   dropping out the end of a segment come back to the other end,
   whereever that end has been programatically set.

   it's equivalent to the valarray "shift" and "cshift" functions,
   but not quite - because you can "break" the string into arbitrary
   lengths.

   ... quite an interesting data logistics problem, there :)

 * an instruction that checks, down the length of a string of APEs,
   where a particular bit in a register is set, and where that bit
   ends, with the results ending up in _two_ "tag" registers

   bit 5 in the APEs: 0 0 1 1 1 1 1 0 0 1 1 0 0 0 0 1 1 1 1  results in:
   tag 1 register     0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
   tag 2 register     0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1

   this is typically used to implement the "carry" of a parallelised
   add instruction.

   each APE is utilised to perform an 8-bit add, over 8 instruction
   cycles, and then the 9th instruction is one of these "carry"
   instructions, and the 10th instruction the tag2 register is shuffled
   right one APE, and then added as a "carry" bit.

 * an instruction that checks whether all bits are set in a particular
   APE:

   bit 5 in all APEs: 1 1 1 1 1 1 1 1 1 1 0 1 1 1
   tag 1 register   : 0 0 0 0 0 0 0 0 0 0 0 0 0 0

   bit 5 in all APEs: 1 1 1 1 1 1 1 1 1 1 1 1 1 1
   tag 1 register   : 1 1 1 1 1 1 1 1 1 1 1 1 1 1

   this can also, iirc, be "qualified" by tagging.... hm, not sure...
   it's been a long time, but if it _does_ then it would go like this:

   bit 5 in all APEs: 1 1 1 0 1 0 1 1 1 1 0 1 1 1
   tag 1 register   : 1 1 0 0 1 0 1 1 1 1 1 1 1 1
   tag 2 register   : 0 0 0 0 0 0 0 0 0 0 0 0 0 0

   bit 5 in all APEs: 1 1 1 0 1 0 1 1 1 1 1 1 1 1
   tag 1 register   : 1 1 0 0 1 0 1 1 1 1 1 1 1 1
   tag 2 register   : 1 1 0 0 1 0 1 1 1 1 1 1 1 1

   i.e the tag1 register tells you "we don't give a stuff".

 * likewise an instruction that checks whether all bits are clear.


 from this, it should be _very_ obvious that valarray is
 _highly_ suited to hardware acceleration by an ASP:

 valarray<bool> b(20);
 valarray<int>  x(20);
 valarray<int>  y(20);

 ..... obtain x data...
 ..... obtain y data...

 b = (x != 19);
 x[b] += y[b];

 i.e. _only_ in those elements where b is not equal to 19,
 add the corresponding array element of y.

 this is _exactly_ the sort of thing where APE "tagging" allows an
 instruction to be conditionally performed - in parallel.


 the only thing that definitely _doesn't_ exist conceptually
 in valarray is that "carry-equivalent" function.

 it's just not something that people other than Aspex have ever
 thought about, because of course nobody in their right minds does
 bit-level programming any more :)

 l.

Re: [lkcl@lkcl.net: has gcc been reworked so that code/templates can be "outsourced" e.g. to perl yet?]

Reply via email to