Fuming,

Could you please explain your approach a little bit? From the numbers you
quote, this sounds extreme positive, but I have a hard time understanding
how you achieve them. Taking 100k playouts/sec for 9x9 on my 2.4 GHz labtop
for my single-threaded bitmap based light-playout implementation as an
example, with 110 moves/playout, this results in a little less than 240
clock-cycle/move. When I quickly looked up the Cyclone III specification, I
saw that the clock-speed for this FPGA tops out around 240 MHz, yet you
achieve 15x the throughput, i.e., you are 150x more efficient. This means
1.8 clock-cycle/move. Without being able to make use of pipe-lining inside
the CPU (someone measured ~2 assembly instructions/clock-cycle for my bitmap
approach), this leads me to questions. First, are you running a single
threaded application, or playing on multiple boards at once? Second, are you
just replaying moves, or also generating them on the fly (about half of the
time is spend there in my implementation, more if you include updating the
data-structure to make that possible)? Third, are we using the same
definitions?

For instance, I would find it much more comprehensible to believe that you
achieved to do 1500k moves/second instead of 1500k playouts/sec (with each
playout being ~110 moves). 200 clock-cycles/move sounds do-able if you can
avoid branching, memory lookups, or miscellaneous calculations by creating
fine-level parallelism in your FPGA-code and specializing functions on a per
grid-point basis. In a CPU-based application, this results in code-bloat
that will become counter-productive at some stage, may not be feasible in
all instances, and is more difficult to maintain. For an FPGA-based
application, however, this sounds entirely possible (not knowing anything
about FPGA's).

Thanks,

René van de Veerdonk

On Sat, Jun 12, 2010 at 10:37 AM, Fuming Wang <[email protected]> wrote:

>
> Cyclone III
> 120,000 logical elements
> cycle time is linear to the number of moves to finish a game, which is
> approximately linear to the square of the board size.
>
> Fuming
>
>
>> - What FPGA? Virtex-6? Spartan-6?
>> - What size is the core in LUT's?
>> - Is your cycle time linear in the board size or in the number of
>> squares (i.e. quadratic in board size)? Or something else?
>>
>> --
>> GCP
>> _______________________________________________
>> Computer-go mailing list
>> [email protected]
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>
>
>
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to