The cell supports both a SIMD and a table-lookup based implementation.
It also has a bitcount instruction which is useful for the feedback
tap calculation.
With the 256kbyte SRAM you could process 5 bits at once with table
lookups instead of 4 bit like we do in NVIDIA GPUs, but it may be
faster to use the SIMD instructions and store each of R1 and R2 and R3,
which are smaller than 32bit in bundles of 4 in each of the 128bit
cell registers, so that you can compute them simultaneously.
It may also be faster to store 8 bundles of R1 in 2 registers, because
all the taps of R1 are close together. When you shift R1 you would have
to carry it over from one of the 128bit registers to the next.
That would mean (using http://en.wikipedia.org/wiki/File:A5-1.png
as reference) bits 18-2 of AR1 (A5/1 register 1)are stored in CR1
(cell register 1) and bits 2, 1 and 0 of AR1 are in CR2 and when you
shift AR1, then bit2 of CR2 becomes bit0 of CR1.
Or, you only transfer between CR1 and CR2 after 8 clockings, then
bits 2-10 of CR2 become bits 0-8 of CR1.
Next crazy thing would be calculating 16 8-bit bundles each where you
would need 3 CRs to store each of the ARs, but then you could calculate
16 chains per core.
Hard to say how fast you can do it, but a max of 8 SPUs at 3ghz with 16
chains per core would allow to do 8*3G*16 / (64 bits of keystream per round).
6Giga-A5/1 per second still to be divided by the number of cycles you need
to compute one bit of A5/1. Estimating that to be 200 you would reach
max 30 million A5/1 rounds per second.
A 9600M-GT does 20 million, a GTX260 does 160 million.
So even if you need 300 instructions in your algorithm that uses a 16 wide
SIMD layout, you would still be in business.
I would like to see a cell implementation very much and i bought myself
a playstation3 and installed linux, but i did not find the time to
dive into it. Even if Cells are not the fastest hardware out there,
if we get access to clusters of them, they would make a significant
contribution.


On Wed, Oct 28, 2009 at 11:41:34AM -0400, James Evans wrote:
> I have access to a Cell cluster and am interested in porting the code
> over. Any suggestions? Have calculations been run to estimate speed?
> 
> Thanks
_______________________________________________
A51 mailing list
[email protected]
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51

Reply via email to