Re: [computer-go] RefBot (thought-) experiments

Weston Markham Tue, 16 Dec 2008 16:35:08 -0800

On Mon, Dec 15, 2008 at 5:47 PM, Don Dailey <[email protected]> wrote:
> Is Jrefgo the pure version that does not use tricks like the futures
> map?   If you use things like that, all bets are off - I can't be sure
> this is not negatively scalable.

I don't know, although I was under the impression that I had
downloaded the "pure" version.  I found a reference to the source here
on the list, and downloaded and compiled that.  When I get back home,
how would I quickly determine which is the case?

> You cannot draw any reasonable conclusions by stopping after 10 moves
> and letting gnugo judge the game either.    Why didn't you play complete
> games?

I think that complete games would have to be at least one of:
1.  Against a similarly weak opponent.  This casts doubt on whether
the results apply against other opponents.
2.  Unlikely to be won by an AMAF program.  This makes their
differences hard to measure.
3.  Played with handicap stones.  The granularity seems too coarse on
9x9.  Nevertheless, it might be worthwhile to try this.
4.  Played with a komi that is very far from an even game.  In 9x9,
this would mean that the better player must control the entire board
in order to win.  At that point, komi no longer becomes useful as a
means for providing a handicap.

Originally, (about two years ago) I ran studies such as this in order
to tune parameters that affected the playouts, and that I thought
could probably have different optimum values at different points in
the game.  Playing against an opponent that is generally stronger
makes it more likely that the improvements I find are likely to apply
to opponents in general, rather than simply tuning my program against
one particular opponent.  Playing against a close relative of the same
program (e.g., pitting 5k against 100k directly) gives misleading
results, in my experience.  Often, both programs will be blind to the
same lines of play, allowing genuinely bad moves to go unpunished.

On Mon, Dec 15, 2008 at 6:10 PM, Mark Boon <[email protected]> wrote:
> It would have been much more persuasive if you had simply run a 5K
> playout bot against a 100K bot and see which wins more. It shouldn't
> take much more than a day to gather a significant number of games.

I may do that, although personally I would be far more cautious about
drawing conclusions from those matches, as compared to ones played
against a strong reference opponent.  But I guess other people feel
differently about this.  Anyway, the results would still be
interesting to me no matter which way they went, even if they failed
to convince me of anything.

> I did in fact put up a 100K+ ref-bot on CGOS for a little while, and
> it ended up with a rating slightly (possibly insignificantly) higher
> than the 2K ref-bot. Maybe I didn't put it there long enough,
> certainly not for thousands of games. But it didn't look anywhere near
> to supporting your findings.

That doesn't particularly disagree with my conclusions either.  For
example, I would guess that the best overall performance is somewhere
around 5k-10k, so a program with a setting in that range would obtain
a higher rating than either of 2K and your "100K+".  I could easily be
wrong about that, though.

> I say 100K+ because I didn't set it to a specific number, just run as
> many as it could within time allowed. Generally it would reach well
> over 100K per move, probably more like 250K-500K. That should only
> make things worse according to your hypothesis.

Yes, this is what sparked the conversation originally.  When you
reported that a while ago, my reaction was, "Of course that won't work
very well; you're running way too many simulations."  I was actually a
bit surprised that noone else thought that this was as bad as I think
it is.

> So although I think the result of your experiment is very curious, I
> think it might be a bit hasty draw your conclusion.

Yes, it very well may be.  As I mentioned, I ran a number of similar
experiments a couple years ago, for which I unfortunately lost the
results.  My recollection is that they typically indicated the same
thing, across a number of variations on my own program.  Performance
would improve up to a point, then degrade as the program's behavior
became essentially deterministic.  But I may have made mistakes in
those tests, or I could be misremembering.

On Tue, Dec 16, 2008 at 12:20 PM, Don Dailey <[email protected]> wrote:
> A monte carlo bot like refbot, in most positions is going to converge on
> some specific move.   I think in the starting position it "wants" to
> play e5 and it is going to play e5 with an infinite number of playouts,
> whether than is the best move or not.    There will be many situations
> where the move it "wants" to play is not the best, and so you can
> surmise that it's more likely to play a good move with fewer playouts.

Incidentally, when I get home, I'll post the line of play that follows
those moves with the highest (asymptotic) Monte Carlo values,
according to jrefgo.  I have about 18 moves calculated with high
accuracy.

> However, that by no means implies that it will play better with fewer
> playouts.   It may play the worst move on the board too - the chances of
> that happening increases as the number of playouts drops.   So this cuts
> both ways.

Yes, it certainly does cut both ways.  But I think it should not be
too hard to convince yourself that if the worst move on the board
happens to have a low Monte Carlo value, the program will detect this
much more quickly than it will detect comparatively slight differences
between the top few moves.  The move with the best Monte Carlo value
is, according to my own understanding, a reasonable-looking move that
nevertheless often demonstrates clear "misconceptions" of that
methodology.  I'm sure that it's rare for it to be the "worst" move on
the board.  But I also think that it is also generally uncommon for it
to be the best move.

Here's another experiment that I might try:  Pick a few individual
positions from some games played by high-level players.  Then,
calculate very accurate Monte Carlo values for the moves from that
position, as well as win rates for those moves, based on top programs.
 From that information, I should be able to quickly run simulations to
find an overall expected win rate that would result from playing the
current best move of a Monte Carlo program, followed by the moves
played by the reference program.  As the accuracy of the Monte Carlo
program increases, do more of these approach their asymptote from
above or below?  (Is it possible that this depends only on the
relative win rates of the top two moves?)

> Another way to look at this is that "playing better" may not correlate
> well with increasing your chances of playing the best move - it's more
> about increasing your chances of playing a "good" move,  or in my own
> personal opinion it's about avoiding bad moves.

I think that last part is exactly right.  And I believe that current
Monte Carlo methods only really manage to avoid the very worst of the
bad moves, regardless of how many playouts they run.

Weston
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] RefBot (thought-) experiments

Reply via email to