We are able to run 167 games in parallel.  Unfortunately, it is not easy to
incorporate FPGA based simulation into UCT based MCTS scheme. So, we are
trying the traditional alpha-beta type tree search method to try to evaluate
the usefulness of our FPGA based MC simulation.

Best,
Fuming

On Wed, Jun 16, 2010 at 11:21 AM, Mark Boon <[email protected]>wrote:

> I must admit I had completely misread this. I thought it said 1500
> playouts/sec. and didn't give it a second glance, thinking this is
> just a first effort. 1500K playouts/sec. is a completely different
> ball-game.
>
> I suppose the question is: how many games do you need to compute in
> parallel to achieve this speed? And would you still be able to collect
> AMAF information on the playouts?
>
> Interesting...
>
>    Mark
>
> On Tue, Jun 15, 2010 at 4:03 PM, Fuming Wang <[email protected]> wrote:
> > Hi Rene,
> >
> > Our design is fully pipelined, so we are able to simulate multiple games
> > simultaneously. The way way in which simulations are run in FPGA and in
> CPU
> > is quite different, so direct comparison is not easy. If we want to
> simulate
> > just one game, FPGA implementation is not 10x faster, however, if we want
> > thousands of games simulated for a single board position, than FPGA is
> 10x
> > faster. So, we are getting 1500k GAMES/sec, but only in the second sense.
> > The clock rate of our FPGA board is only 125 MHz, so with better
> board/chip,
> > we will still have 10-100 times improvement over the current result.
> >
> > best,
> > Fuming
> >
> > On Wed, Jun 16, 2010 at 1:28 AM, René van de Veerdonk
> > <[email protected]> wrote:
> >>
> >> Fuming,
> >> Could you please explain your approach a little bit? From the numbers
> you
> >> quote, this sounds extreme positive, but I have a hard time
> understanding
> >> how you achieve them. Taking 100k playouts/sec for 9x9 on my 2.4 GHz
> labtop
> >> for my single-threaded bitmap based light-playout implementation as an
> >> example, with 110 moves/playout, this results in a little less than 240
> >> clock-cycle/move. When I quickly looked up the Cyclone III
> specification, I
> >> saw that the clock-speed for this FPGA tops out around 240 MHz, yet you
> >> achieve 15x the throughput, i.e., you are 150x more efficient. This
> means
> >> 1.8 clock-cycle/move. Without being able to make use of pipe-lining
> inside
> >> the CPU (someone measured ~2 assembly instructions/clock-cycle for my
> bitmap
> >> approach), this leads me to questions. First, are you running a single
> >> threaded application, or playing on multiple boards at once? Second, are
> you
> >> just replaying moves, or also generating them on the fly (about half of
> the
> >> time is spend there in my implementation, more if you include updating
> the
> >> data-structure to make that possible)? Third, are we using the same
> >> definitions?
> >> For instance, I would find it much more comprehensible to believe that
> you
> >> achieved to do 1500k moves/second instead of 1500k playouts/sec (with
> each
> >> playout being ~110 moves). 200 clock-cycles/move sounds do-able if you
> can
> >> avoid branching, memory lookups, or miscellaneous calculations by
> creating
> >> fine-level parallelism in your FPGA-code and specializing functions on a
> per
> >> grid-point basis. In a CPU-based application, this results in code-bloat
> >> that will become counter-productive at some stage, may not be feasible
> in
> >> all instances, and is more difficult to maintain. For an FPGA-based
> >> application, however, this sounds entirely possible (not knowing
> anything
> >> about FPGA's).
> >> Thanks,
> >> René van de Veerdonk
> >>
> >> On Sat, Jun 12, 2010 at 10:37 AM, Fuming Wang <[email protected]>
> wrote:
> >>>
> >>> Cyclone III
> >>> 120,000 logical elements
> >>> cycle time is linear to the number of moves to finish a game, which is
> >>> approximately linear to the square of the board size.
> >>>
> >>> Fuming
> >>>
> >>>>
> >>>> - What FPGA? Virtex-6? Spartan-6?
> >>>> - What size is the core in LUT's?
> >>>> - Is your cycle time linear in the board size or in the number of
> >>>> squares (i.e. quadratic in board size)? Or something else?
> >>>>
> >>>> --
> >>>> GCP
> >>>> _______________________________________________
> >>>> Computer-go mailing list
> >>>> [email protected]
> >>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Computer-go mailing list
> >>> [email protected]
> >>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >>
> >>
> >> _______________________________________________
> >> Computer-go mailing list
> >> [email protected]
> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >
> >
> > _______________________________________________
> > Computer-go mailing list
> > [email protected]
> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to