We are able to run 167 games in parallel. Unfortunately, it is not easy to incorporate FPGA based simulation into UCT based MCTS scheme. So, we are trying the traditional alpha-beta type tree search method to try to evaluate the usefulness of our FPGA based MC simulation.
Best, Fuming On Wed, Jun 16, 2010 at 11:21 AM, Mark Boon <[email protected]>wrote: > I must admit I had completely misread this. I thought it said 1500 > playouts/sec. and didn't give it a second glance, thinking this is > just a first effort. 1500K playouts/sec. is a completely different > ball-game. > > I suppose the question is: how many games do you need to compute in > parallel to achieve this speed? And would you still be able to collect > AMAF information on the playouts? > > Interesting... > > Mark > > On Tue, Jun 15, 2010 at 4:03 PM, Fuming Wang <[email protected]> wrote: > > Hi Rene, > > > > Our design is fully pipelined, so we are able to simulate multiple games > > simultaneously. The way way in which simulations are run in FPGA and in > CPU > > is quite different, so direct comparison is not easy. If we want to > simulate > > just one game, FPGA implementation is not 10x faster, however, if we want > > thousands of games simulated for a single board position, than FPGA is > 10x > > faster. So, we are getting 1500k GAMES/sec, but only in the second sense. > > The clock rate of our FPGA board is only 125 MHz, so with better > board/chip, > > we will still have 10-100 times improvement over the current result. > > > > best, > > Fuming > > > > On Wed, Jun 16, 2010 at 1:28 AM, René van de Veerdonk > > <[email protected]> wrote: > >> > >> Fuming, > >> Could you please explain your approach a little bit? From the numbers > you > >> quote, this sounds extreme positive, but I have a hard time > understanding > >> how you achieve them. Taking 100k playouts/sec for 9x9 on my 2.4 GHz > labtop > >> for my single-threaded bitmap based light-playout implementation as an > >> example, with 110 moves/playout, this results in a little less than 240 > >> clock-cycle/move. When I quickly looked up the Cyclone III > specification, I > >> saw that the clock-speed for this FPGA tops out around 240 MHz, yet you > >> achieve 15x the throughput, i.e., you are 150x more efficient. This > means > >> 1.8 clock-cycle/move. Without being able to make use of pipe-lining > inside > >> the CPU (someone measured ~2 assembly instructions/clock-cycle for my > bitmap > >> approach), this leads me to questions. First, are you running a single > >> threaded application, or playing on multiple boards at once? Second, are > you > >> just replaying moves, or also generating them on the fly (about half of > the > >> time is spend there in my implementation, more if you include updating > the > >> data-structure to make that possible)? Third, are we using the same > >> definitions? > >> For instance, I would find it much more comprehensible to believe that > you > >> achieved to do 1500k moves/second instead of 1500k playouts/sec (with > each > >> playout being ~110 moves). 200 clock-cycles/move sounds do-able if you > can > >> avoid branching, memory lookups, or miscellaneous calculations by > creating > >> fine-level parallelism in your FPGA-code and specializing functions on a > per > >> grid-point basis. In a CPU-based application, this results in code-bloat > >> that will become counter-productive at some stage, may not be feasible > in > >> all instances, and is more difficult to maintain. For an FPGA-based > >> application, however, this sounds entirely possible (not knowing > anything > >> about FPGA's). > >> Thanks, > >> René van de Veerdonk > >> > >> On Sat, Jun 12, 2010 at 10:37 AM, Fuming Wang <[email protected]> > wrote: > >>> > >>> Cyclone III > >>> 120,000 logical elements > >>> cycle time is linear to the number of moves to finish a game, which is > >>> approximately linear to the square of the board size. > >>> > >>> Fuming > >>> > >>>> > >>>> - What FPGA? Virtex-6? Spartan-6? > >>>> - What size is the core in LUT's? > >>>> - Is your cycle time linear in the board size or in the number of > >>>> squares (i.e. quadratic in board size)? Or something else? > >>>> > >>>> -- > >>>> GCP > >>>> _______________________________________________ > >>>> Computer-go mailing list > >>>> [email protected] > >>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > >>> > >>> > >>> > >>> _______________________________________________ > >>> Computer-go mailing list > >>> [email protected] > >>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > >> > >> > >> _______________________________________________ > >> Computer-go mailing list > >> [email protected] > >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > > > > > _______________________________________________ > > Computer-go mailing list > > [email protected] > > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
