The cell supports both a SIMD and a table-lookup based implementation. It also has a bitcount instruction which is useful for the feedback tap calculation. With the 256kbyte SRAM you could process 5 bits at once with table lookups instead of 4 bit like we do in NVIDIA GPUs, but it may be faster to use the SIMD instructions and store each of R1 and R2 and R3, which are smaller than 32bit in bundles of 4 in each of the 128bit cell registers, so that you can compute them simultaneously. It may also be faster to store 8 bundles of R1 in 2 registers, because all the taps of R1 are close together. When you shift R1 you would have to carry it over from one of the 128bit registers to the next. That would mean (using http://en.wikipedia.org/wiki/File:A5-1.png as reference) bits 18-2 of AR1 (A5/1 register 1)are stored in CR1 (cell register 1) and bits 2, 1 and 0 of AR1 are in CR2 and when you shift AR1, then bit2 of CR2 becomes bit0 of CR1. Or, you only transfer between CR1 and CR2 after 8 clockings, then bits 2-10 of CR2 become bits 0-8 of CR1. Next crazy thing would be calculating 16 8-bit bundles each where you would need 3 CRs to store each of the ARs, but then you could calculate 16 chains per core. Hard to say how fast you can do it, but a max of 8 SPUs at 3ghz with 16 chains per core would allow to do 8*3G*16 / (64 bits of keystream per round). 6Giga-A5/1 per second still to be divided by the number of cycles you need to compute one bit of A5/1. Estimating that to be 200 you would reach max 30 million A5/1 rounds per second. A 9600M-GT does 20 million, a GTX260 does 160 million. So even if you need 300 instructions in your algorithm that uses a 16 wide SIMD layout, you would still be in business. I would like to see a cell implementation very much and i bought myself a playstation3 and installed linux, but i did not find the time to dive into it. Even if Cells are not the fastest hardware out there, if we get access to clusters of them, they would make a significant contribution.
On Wed, Oct 28, 2009 at 11:41:34AM -0400, James Evans wrote: > I have access to a Cell cluster and am interested in porting the code > over. Any suggestions? Have calculations been run to estimate speed? > > Thanks _______________________________________________ A51 mailing list [email protected] http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
