Re: [computer-go] RAVE formula of David Silver (reposted)

Jason House Sun, 30 Nov 2008 10:53:04 -0800

On Nov 30, 2008, at 11:49 AM, Mark Boon <[EMAIL PROTECTED]>wrote:

Indeed, the scaling question is very important. Even though I thinkI have AMAF/ RAVE working now, it's still not so clear-cut what it'sworth. With just 2,000 playouts I'm seeing a 88% win-rate againstplain old UCT tree-search without RAVE. At 10,000 playouts this win-rate drops to 75%. With 50,000 to 69%. All of these results have amargin of error of a few points, but the trend is obvious. UCT playsweaker than UCT+RAVE but it scales a little better. This doesn'tnecessarily mean they converge. From the few data-points that I haveit looks like UCT+RAVE might converge to a winning rate of 66%against plain UCT search with playouts in the hundred-thousands ormillions. Is that about 100 ELO points? That in itself would bejustification enough to keep it. But there's a computation-cost aswell. Plus, as soon as you start to introduce other move-selectionprocedures it may eat into the gain RAVE provides even further.
Anyhow, the way I have it set now I can easily switch between usingAMAF information to compute RAVE or not. There are also still someparameters to tune. So this is not the last word on it by far. It'smore like a good starting point. Also, even if it's not something touse in a final playing-engine, it's good to have a 'base-line' thatprovides the best possible time/strength combination to run quicktests against.
Is there actually a well-understood basis for the diminishing returnof UCT+RAVE vs. UCT? I have given it a little thought, but it's notentirely obvious to me why UCT+RAVE wouldn't scale better than whatI'm seeing.
I've also run into a few 'fluke' results. Winning streaks of a dozengames in a row (or more) happen between equally strong programs. Soto be reasonably sure I'd like to get about 1,000 games. If you wantto make sure two implementations are equivalent, like in case of theref-bots, I'd recommend 10,000 games.
If all I want to know is whether something is an improvement or not,then I usually settle for fewer games. If after a (few) hundredgames I see a win-rate of 50% or less I decide it's not animprovement (not one worth anything anyway), if I see a winning-rateof around 60% or more I keep it. Anything in between I might decideto let it run a bit more. The improvements that I keep I run withlonger thinking times overnight to see if they scale. After all, theonly real test worth anything is under realistic playingcircumstances.

You've claimed to be non-statistical, so I'm hoping the following isuseful... You can compute the likelihood that you made an improvementas:

erf(# of standard deviations)
Where # of standard deviations =
(win rate - 0.5)/sqrt(#games)

Erf is ill-defined, and in practice, people use lookup tables totranslate between standard deviations and confidence levels. Inpractice, people set a goal confidence and directly translate it to anumber of standard deviations (3.0 for 99.85%). This situationrequires the one-tailed p test.

After about 20 or 30 games, this approximation is accurate and can beused for early termination of your test.



Mark

On 29-nov-08, at 11:32, Don Dailey wrote:

On Sat, 2008-11-29 at 11:58 +0100, Denis fidaali wrote:


From my own experience, an important testing case whenever trying
to get AMAF to work, is the scaling study.


No truer words ever spoken.  This is one of the secrets to strong

programs, if they scale, they are probably soundly designed. I dothatwith Chess. I find that some program changes scale up, particularsoundalgorithms that reduce the branching factor. I have to run testsprettyfast in order to get results I can interpret, but I also plot theresult

visually with gnuplot.

As many here will recall,  my own Fatman study vs Leela showed that

Leela scaled better with increasing depth than Fatman. Nothinglike a

graph to reveal this very clearly, although you can also look at the
numbers if you are careful.

It's important to point out that you will be completely mislead ifyou

don't get enough samples.  It's very rare that 100 or 200 games are
enough to draw any conclusions (unless it's really lopsided.)  I
remember once thinking I had found a clear scalable improvement but
decided that it must run longer - but I was hopeful.  When the

improvement started to decline, I discovered that I had by accidentbeen

running the same exact program against itself.

The point is that it is not uncommon to get really "lucky", andhave an

equal program look substantially superior - for  a while.

- Don

My prototype was quite strong considering that it used only 1000
light playout

(and score 25-30 % win against gnugo lvl 0), but it seemed to notget

much

over that as the number of playout grew ... (also there had aseriousexponential complexity problem, which i never get into the troubleof

investigating :) )

I know that Zoe was about 2000 elo with i think 50k simulations,
and ... never
got any real better as the number of simulations increased.

Both prototype were toying with AMAF, so i really think you need abit

of scalability
study whenever trying to asses an engine employing it. (although it
could very well be
that the scalability trouble came out of some nasty bugs. Both
aforementioned prototype
where quite messy ...)

From: [EMAIL PROTECTED]
Subject: Re: [computer-go] RAVE formula of David Silver (reposted)
Date: Sat, 29 Nov 2008 03:39:58 -0200
To: computer-go@computer-go.org
CC:


On 28-nov-08, at 17:28, [EMAIL PROTECTED] wrote:

I would be very interested to see the RAVE code from Valkyria.

I'm

sure others would be too.


I'm much more interested in a general concise description. If such

description cannot be given easily, then I think there's little

point

including it in the definition of a MCTS reference engine.

I found a serious flaw in my code collecting the AMAF scores, which
explains why I wasn't seeing any gains so far with AMAF turned on.
Now over the first 100+ games UCT+RAVE scores 90% over plain UCT.

I'm

going to run a test overnight, but so far it looks good. It should
have collected a few thousand samples by tomorrow.

Hopefully next week I can go back to testing my list of playout
improvements, which is why I started making the MCTS reference
implementation in the first place. This RAVE stuff caused a bit of

distraction, but it's nice to have if it works.

Mark
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



______________________________________________________________________

Souhaitez vous « être au bureau sans y être » ? Oui je le veux !

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] RAVE formula of David Silver (reposted)

Reply via email to