Pebbles gains more than 50 rating points per doubling. I am mentally using
3x = 150 rating points on 9x9. Fuego and Mogo seem to have similar
behaviors.
I agree with Don that current computer engines are nowhere near topping out.
Speed will be significant for a long, long time.
My belief is that light playouts are doomed.
E.g., the program myCtest-50k-UCT, which seems to use 50K playouts and the
UCT algorithm has a Bayeselo 1683 on CGOS 9x9.
Mogo3 with 1K: 1847
Mogo3 with 3K: 2010
Mogo3 with 10K: 2152
Mogo3 with 30K: 2297
Here is Fuego:
Fuego-1418-100: 1538
Fuego-1418-1000: 1970
Fuego-1418-10000: 2294
Fuego-1418-100000: 2548
It is literally true that Mogo's and Fuego's knowledge is worth more than 50
trials of myCTest.
Now, much of that knowledge takes the form of tree search heuristics. It is
not clear how much you can strip out of the playout policy. At a minimum the
playout policy must represent
- atari/self-atari rules sufficient to handle seki
- Move selection based on local patterns
This would already be challenging to add to a GPU kernel.
From: [email protected]
[mailto:[email protected]] On Behalf Of Don Dailey
Sent: Wednesday, June 01, 2011 2:49 PM
To: [email protected]
Subject: Re: [Computer-go] Direct DX11 and graphics cards for cheaper
simulation hardware?
2011/6/1 Bojun Huang <[email protected]>
Hi Don,
my replies are inlline. Thanks.
Bojun
>Date: Wed, 1 Jun 2011 10:59:31 -0400
>From: Don Dailey <[email protected]>
>To: [email protected]
>Subject: Re: [Computer-go] Direct DX11 and graphics
cards for cheaper
> simulation hardware?
>Message-ID: <[email protected]>
>Content-Type: text/plain; charset="iso-8859-1"
>
<mailto:[email protected]%3E%3E+I> > I don't think there is a widely
accepted standard benchmark, but I think
>there are MCTS bots that do light playouts and we could probably "pick" one
>to consider a reference implementation. Some programs may provide the
>option for light playouts, or perhaps could make this an option. What
>do you mean by "light playouts?" One definition is any legal move is
>played with equal probability and another
definition is "less heavy."
I refer to the latter one. Actually my question is just for defining what
"light playout" is, or, how much "less" heavy is acceptable to keep the
program showing state-of-art performance, and to make the tradoff between
speed and selectivity still within the meaningful range. Clearly, random
selection on all legal moves from uniform distribution is less valuable for
building really strong programs. But on the other hand, how "light" the
playouts in a "strong" program could be? Is there strong program (not
nessarily the top programs) that is well known to use light playouts?
>>
>> Another question ... does more playouts really provide a *consistent*
>> improvement on the ELO score, especially for those strongest programs? I
>> remember that some programs running on laptop rank very high in the
>> Olympaids, that seems imply that speed simply doesn'
t matter here ...
>>
>
>More playouts always consistently improves a program. Like in computer
>chess, it takes a fair increase to be easily measurable so don't listen to
>anecdotal evidence that this is not so. You might hear someone say, "I
>doubled the number of playouts and it did not seem to play any better - or
>it lost a 10 game match despite the doubling." Doubling the number of
>playouts is not going to increase the strength enough that it will
guarantee
>a win in a 10 or 20 game match so when that happens some people will
>conclude that increasing the number of playouts has no effect on the
>strength, but that is nonsense.
>
>Don
>
Yeah, it's intuitive that more playouts would improve the program, but it
may also intuitive that this improvement would become more and more marginal
as the playing capability of the program increases.
This is a huge myth that for some reason is hard to shake, but it's just
not the case. Yes, there probably is a fall-off but it's so gradual I
doubt it can be detected in Go and there is no sudden point you can identify
where extra playouts suddenly stop helping.
I heard that doubling the number of playouts gives 50 ELO scores, but this
kind of linear improvement seems cannot hold on in my program when the score
is beyond 2400. Seems the additional playouts is more good at turning weak
programs into average programs than turning average programs into strong
programs.
The lesson learned from other games, especially chess, is that the fall-off
is very gradual. To this day, we STILL get a very nice rating improvement
from a doubling of speed but it's true that it's not what it was 20 years
ago. But the fact that we still get substantial improvement for doubling
the speed in chess even though computers are well beyond human skill levels
means we still have a long way to go and that in games like GO we are just
barely getting started on this curve. (Also, in chess at this
incredible level a lot of games are draws, so getting about 30 ELO or more
for a doubling is quite impressive.)
Everyone's intuition that there "should be" a sudden fall-off is nonsense
and it's based on our poor perception of time - we place WAY TOO MUCH weight
on the present and hardly any on the past and future. Thus, it was
once believed that NOBODY would ever run the 4 minute mile (because at that
exact moment in time it seemed impossible) and it seems that all progress
in computer go has stopped (because programs are not much better than they
were 2 months ago.) But you have to look at how much progress we made in
2 years, not 2 months.
There is another perception problem too because a doubling does not make so
much difference it immediately obvious or noticeable as I have already
stated. So you could run 200 games and not see an improvement (or see only
a very small improvement) and thus conclude that you have suddenly hit the
wall! But I doubt you are going to run the thousands of games needed with
a doubling of playouts because it would take more time than we are
comfortable with (again, a human perception factor here.) So I think
it's likely that you don't fully understand what you are looking at.
There is one possibility however, which Magnus touched on. It is possible
that some MCTS programs have something missing that negatively affects
scalability. It could be an outright coding bug, or something else.
In theory (and in practice), you should asymptotically approach perfect
play as you increase the number of playouts. If you could do infinite
playouts and still not get perfect play then something is broken. It might
not affect the strength very much when you are doing a few thousand playouts
but it might have a huge impact when you are doing a million and thus you
will experience a (more) rapid decline in benefit for extra playouts at some
point.
This could even be an issue with computer chess with respect to null move
pruning and Graph History Interaction issues, because there are some
positions that may not be solvable no matter how many nodes you look at.
Don
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go