Fuming, Could you please explain your approach a little bit? From the numbers you quote, this sounds extreme positive, but I have a hard time understanding how you achieve them. Taking 100k playouts/sec for 9x9 on my 2.4 GHz labtop for my single-threaded bitmap based light-playout implementation as an example, with 110 moves/playout, this results in a little less than 240 clock-cycle/move. When I quickly looked up the Cyclone III specification, I saw that the clock-speed for this FPGA tops out around 240 MHz, yet you achieve 15x the throughput, i.e., you are 150x more efficient. This means 1.8 clock-cycle/move. Without being able to make use of pipe-lining inside the CPU (someone measured ~2 assembly instructions/clock-cycle for my bitmap approach), this leads me to questions. First, are you running a single threaded application, or playing on multiple boards at once? Second, are you just replaying moves, or also generating them on the fly (about half of the time is spend there in my implementation, more if you include updating the data-structure to make that possible)? Third, are we using the same definitions?
For instance, I would find it much more comprehensible to believe that you achieved to do 1500k moves/second instead of 1500k playouts/sec (with each playout being ~110 moves). 200 clock-cycles/move sounds do-able if you can avoid branching, memory lookups, or miscellaneous calculations by creating fine-level parallelism in your FPGA-code and specializing functions on a per grid-point basis. In a CPU-based application, this results in code-bloat that will become counter-productive at some stage, may not be feasible in all instances, and is more difficult to maintain. For an FPGA-based application, however, this sounds entirely possible (not knowing anything about FPGA's). Thanks, René van de Veerdonk On Sat, Jun 12, 2010 at 10:37 AM, Fuming Wang <[email protected]> wrote: > > Cyclone III > 120,000 logical elements > cycle time is linear to the number of moves to finish a game, which is > approximately linear to the square of the board size. > > Fuming > > >> - What FPGA? Virtex-6? Spartan-6? >> - What size is the core in LUT's? >> - Is your cycle time linear in the board size or in the number of >> squares (i.e. quadratic in board size)? Or something else? >> >> -- >> GCP >> _______________________________________________ >> Computer-go mailing list >> [email protected] >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >> > > > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
