Fuming and Petr,

I was giving the tree-search some thought over the weekend and made some
crude estimates based on my own MCTS implementation. I did not make fresh
timing measurements, just added/subtracted some numbers to get to where I
want to be. Assuming you want to maintain RAVE and win-rate information, I
am guesstimating that for my quasi-heavy playouts I find an overhead of
about 17% for the tree-search part (50,000 playouts total). My quasi-heavy
playouts run at 20,000 playouts/second, and I am estimating that if you
appropriately break the tree-search portion in descend, update, copy, and
misc threads, you can get the overhead to come down to, say, 5%. Then, if
you completely outsource the playouts to the FPGA, i.e., you make them
cost-free (ignoring communication costs), you would be able to boost the
playouts twenty-fold, to 400,000, at which point you become CPU limited.
Assume further that my coding can be further improved by another 2x (not
unreasonable at all), and you get to 800,000.

Thus, with an FPGA that delivers 1,500,000 playout/sec, you have a fairly
serious challenge ahead of you if you want to update the tree for individual
playouts and maintain RAVE information. Note that I only used 50,000 total
playouts to measure the overhead, which is an underestimate if you have this
much processing power available. In addition, if you move the tree-part
off-FPGA, you may be able to get even faster playouts (e.g., two streams in
parallel). Not to mention that FPGA's with 4x faster clocks already exist,
some may be bigger, and you may be able to put a number of them in your
system.

I would say managing this much raw simulation power would require some key
new insight (like Lukasz's epsilon trick proposal), or a reduction of effort
by bunching things together (such as leaf parallellism). Perhaps the
cluster-folks have something valuable to add to this, but I would say that
for a single compute system this is unexplored territory.

Fuming, are you planning to run a version of your program on KGS/CGOS at
some time in the future? Than we can all see it in action.

René

On Mon, Jun 21, 2010 at 7:13 AM, Fuming Wang <[email protected]> wrote:

>
> Hi Petr,
>
> What I mean is that games are simulated in parallel, but are one clock
> cycle apart from each other.
>
> On Sun, Jun 20, 2010 at 8:51 AM, Petr Baudis <[email protected]> wrote:
>
>>  Hi!
>>
>> On Thu, Jun 17, 2010 at 10:04:58PM +0800, Fuming Wang wrote:
>> > > Do all the games have to start synchonously?
>> > >
>> > >
>> > Games are started one clock at a time.
>>
>>   I'm sorry, I don't quite understand that - does that mean that
>> different games can start at different times?
>>
>> > I do not quite understand your description.  I'm guessing that you mean
>> to
>> > have n playouts at a time instead of one playout at a time like the
>> regular
>> > UCT algorithm. Nonetheless, FPGA based method are very static in
>> structure,
>> > and can not adapt easily to the dynamic changes of a game tree like a
>> > software implemented tree structure. So I don't know how to implement
>> UCT
>> > type dynamic tree search algorithms in FPGA right now.
>>
>>   I haven't meant the tree search to be done on the FPGA at all, I have
>> merely tried to draft a method to effeciently queue games to be played
>> on the FPGA next. Are you planning to do the alpha-beta search in the
>> FPGA?
>>
>>
> I see now.  You were thinking about sending games to the FPGA and using the
> simulation results in the main computer.  This is not what we are doing
> right now, we are doing the tree search in the FPGA.
>
> Fuming
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to