Go Fast: <[email protected]>:
>Kato, have you compared the speed of the simulations on PS3 SPE to the speed
>of the simulations on PC, given that the program is optimized for the cpu on
>both sides.

I've published the comparison in "A Study on Implementing Parallel 
MC/UCT Algorithm" (in Japanese)
http://www.geocities.jp/hideki_katoh/publications/gpw2007/gpw07-private.pdf
at GPW 2007.  Following is a part of Table 1 on page 4 (modified).

CPU     Time    kpps    Ratio
Cell    830 us  1.2     1
x86     163 us  6.1     5.1

Cell runs about five times slower than x86 with almost the same clock 
(3.18 vs. 3.0 GHz), which is much slower than expected due to my 
not-optimized-for-SPU code, ie, the same C code was used.

If I remember correctly, byte access on Cell is 3 to 7 times slower 
than x86 because SPU has only 16 byte load/store instructions.  
Watching generated code, loading a byte is simulated by: mask the 
lower 4 bit of the address, load 16 bytes, shift and mask the data to 
place the target byte at right most byte in the register.  Thus, 4 
instructions are needed for every byte fetch.  Storing a byte is more 
complex: mask, shift, mask the address, load the 16 byte that includes 
the byte to another register, mask, merge, store-back the whole 16 
bytes.

I've implemented bitboard representation for 9 x 9 board for both 
processors, which is thought to best match SPU's 128 128-bit-wide 
general registers.  Due to short of time, I've not compared the 
simulation speed but just the execution time of final_score() function 
using flood-fill algorithm with the general registers on SPU and SSE 
(128-bit wide) registers on x86.  The result was almost the same (Cell 
was faster 5% or less).  

I will rewrite the MC simulator using bitboard on SPU but I have no 
time right now... :(

Hideki

>On Mon, Dec 15, 2008 at 6:40 AM, Hideki Kato <[email protected]> wrote:
>
>>
>> Darren Cook: <[email protected]>:
>> >> Advertisement: Fudo Go used a desktop pc (Intel Q9550) and _eight_
>> >> Playstation 3 consoles on a private Gigabit Ethernet LAN.
>> >
>> >Hello Kato-sensei,
>>
>> Hello Darren,
>>
>> BTW, I'm not a sensei (Professor) but just a doctor course student of
>> 55 years old :).
>>
>> >Are you able to use all 8 cores of the playstation? So, with the 4 of
>> >the Q9550, 68 cores altogether? Do you, or your students, have any
>> >papers on the hardware challenges/solutions?
>>
>> Usual applications can use not 8 but 7 cores in fact because one SPU
>> is used exclusively to protect the secured contents by firmware.  PPU
>> is not used for MC simulations but the commnunications over
>> network etc.
>>
>> I used one core of Intel for the client (UCT tree searcher) and other
>> three for internal MC simulators and 8 times 6 SPU's external.
>> Thus, 51 cores are uesd for MC simulations in total.  The eight PS3
>> consoles boosted Fudo Go by, perhaps, 2 or 3 stones (ranks) on 19
>> x 19.  The difference of the performance between 4 and 8 PS3's is
>> clear but I'm not sure all 6 SPU's are working in full duty, though
>> I'll study it soon.
>>
>> My last paper on parallel MCTS has no description about the
>> implementation for Cell BE.  I'll submit longer paper in this
>> month but if you want to know the detail of my implementation now, you
>> can have the source code of Fudo-Go-2nd-UEC-Cup version, which is
>> exactly what I used for the tournament.
>> http://www.geocities.jp/hideki_katoh/release/fudo-go-2nd-uec-cup.tar.gz
>>
>> Hideki
>> --
>> [email protected] (Kato)
>> _______________________________________________
>> computer-go mailing list
>> [email protected]
>> http://www.computer-go.org/mailman/listinfo/computer-go/
>>
>---- inline file
>_______________________________________________
>computer-go mailing list
>[email protected]
>http://www.computer-go.org/mailman/listinfo/computer-go/
--
[email protected] (Kato)
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to