Also, this rating is calculated by self-play, so it overstates the true
rating.  The asymptotic rating is achievable with today’s hardware, but
fuego on that many cores (my assumption) is not as strong as a professional
go player.

 

David

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Andy
Sent: Monday, June 20, 2011 1:35 PM
To: [email protected]
Subject: Re: [Computer-go] Thoughts about bugs and scalability

 

It's more than 100 vanilla Elo points per rank.

 

The KGS rating algorithm sets 1 rank as:

~139 for 30k-5k  (70% win rate expected in even games between 5k and 6k)

~226 for 2d+      (79% win rate expected in even games between 2d and 3d)

(Sliding scale between 5k and 2d.)

 

EGF uses a similar system to slide the winning percentages

80% win rate expected between 6d and 7d

64% win rate expected between 29k and 30k 

 

On Mon, Jun 20, 2011 at 3:20 PM, Don Dailey <[email protected]> wrote:

Richard,

 

You said the performance asymptote is far from perfect play,  but your
projection in the paper is an  ELO of  4154 ABOVE the level of Fuego playing
at 1 minute per game.   

 

It has been estimated that 100 ELO is about 1 Dan at the low kyu high dan
levels and even less at higher levels.    This implies by your own
calculation that there are at LEAST 41 ranks between the 1 minute Feugo
player and asymptote that you estimated.   

 

That "back of the envelope" calculation is crude I admit but it seems to
indicate that an ELO rating of 4154 compared to Fuego at 1 minute is FAR
above the best player.   

 

So what is the basis of your statement that the asymptote is "far from
perfect play?"

 

I must have missed this in the paper as I read it quickly,  but I did not
see any claims in the paper about this,  which I considered a good things
since we don't know where perfect play is. 

 

Don

 

 

 

The implication of that and your own calculation is that there are 41 ranks
between the 1 minute Fuego and the asymptote.   

 

On Mon, Jun 20, 2011 at 4:03 PM, Richard B Segal <[email protected]> wrote:

Hi All,

As my scalability paper has come up several times in the last few weeks I
thought it would be useful to give a brief overview of the results relevant
to the current conversation and provide additional insights.

The first set of experiments analyzed the scalability of Fuego on a single
core machine without multithreading. The experiments consisted of 1,000
self-play games where one player received N minutes for all moves, and the
second player receiving 2*N minutes for all moves. The resulting scaling
graph looks qualitatively similar to the recent scaling graph posted for
Pachi. The important observation is that the ELO gained between each
successive doubling decreases as search time increases. If we assume this
trend continues, then the ELO gained for each doubling will converge to zero
forcing absolute performance to level off. Ideally, this asymptote would not
occur until perfect play is reached. Unfortunately, for Fuego, fitting an
exponential curve to the data suggests a performance asymptote that is far
from perfect play. 

This disappointing result raises lots of questions and caveats. The results
are specific to Fuego. I suspect all programs will display a similar shaped
scaling curve and asymptote. However, better designed programs should
asymptote to better performance. The best programs may asymptote to perfect
play. The results are for self-play with all its inherent limitations.
However, I suspect self-play will overestimate true scaling, so the results
are still probably an upper bound on Fuego's true performance. Sufficient
memory was available and did not affect the results. All results are for
19x19 play.

The level of play that Fuego appears to asymptote is well below expert play
and may even be below the performance of the top programs such as Zen and
Erica. When Fuego is run our 32 core / 128 thread competition machine, the
above analysis suggests Fuego is within 500 ELO of its maximum possible play
quality. My (almost random) guess is that on competition hardware Fuego is
about 200 to 300 elo weaker than Zen and Erica on competition hardware.

The big question is why this asymptote exists at such a weak level of play.
Unfortunately, I do not have a definitive answer to this question. Some
important observations. I agree with much of the earlier discussion that any
well written MCTS implementation will eventually converge on optimal play. I
believe I can prove that any MCTS implementation that meets the following
criteria:

1) Eventually includes all possible legal move sequences in the tree.
2) Correctly scores terminal positions
3) Selects the highest scoring node an infinite number of times
4) And never lets a node's score drop to zero unless its proven a loss.

will eventually achieve optimal play. The above applies to traditional UCT,
greedy versions of UCT without an exploration term, and most RAVE variants.
Note, I suspect there is some confusion about how quickly the above
converges. The convergence of MCTS is much worse than linear in the size of
the search space. What happens when a move looks good for one million trials
and then a refuting response is found? MCTS may need to revisit the refuting
node millions of times to reduce the score of the parent move to a low
enough value for an alternative move to be preferred. MCTS has been shown to
have a worst case convergence of 2^(2^(2^....) where the number of
exponentiations is linear in the tree depth*

My guess as to why the asymptote exists at such a weak level of play is a
combination of problems with Fuego and the scalability of MCTS as I describe
above. The scaling of MCTS is so poor that it may be impossible from simple
curve fitting to distinguish an asymptote and an extremely slowly improving
curve. I suspect the reason that Fuego hits its asymptote so early in the
play quality curve is due to limits in its design. Programs that can get
more value out of a fixed sized tree can achieve higher levels of play
before the awful scaling of MCTS overtakes and prevents further practical
improvement. 

- Rich


* P. Coquelin and R. Munos. Bandit Algorithms for Tree Search. UAI 2007.



[email protected] wrote on 06/20/2011 02:23:48 PM:

> [image removed] 
> 
> Re: [Computer-go] Thoughts about bugs and scalability
> 
> René van de Veerdonk 
> 
> to:
> 
> computer-go
> 
> 06/20/2011 02:24 PM
> 
> Sent by:
> 
> [email protected]
> 
> Please respond to computer-go


> 
> I am surprised you were asked to pay for the paper, I was not asked 
> to pay for the download (that's why I felt okay to provide the link).
> 
> From the conclusion:
> "We first analyzed the single-threaded scaling of Fuego and found 
> that there is an upper bound on the play-quality improvements that 
> can come from additional search. This in itself limits the potential
> advantage of Blue Gene/P as its total processing power in a one-hour
> timed game easily exceeds the computation needed to maximize Fuego’s
> overall play quality."
> 
> René
> 
> On Mon, Jun 20, 2011 at 10:35 AM, Don Dailey <[email protected]> wrote:
> First of all,  I never purchase papers of unknown quality.   If I 
> know a paper is great or that it comes highly recommended by someone
> I trust I will purchase it although the idea of purchasing papers in
> general is distasteful to me.  
> 
> The scalability issue as you summarize has to do with the number of 
> cores - I am not surprised that more cores bring less improvement 
> and that is true in ALL games.    That's not the issue I am talking 
> about here although it's certain a legitimate practical issue.    
> 
> To better understand the issue,  forget about the specific machine 
> it's on and substitute the phrase "scale with time."   
> 
> There is no scaling issue with reasonably written programs that can 
> be tested with current hardware.    Showing that it's difficult to 
> parallelize an algorithm that is basically serial does not prove 
> anything here.   
> 
> I have explained why people believe MCTS may have limits and why 
> they are wrong.   All such "evidence" that we have hit some wall is 
> anecdotal.   Unless of course you know of someone who has run a few 
> thousands games at 1 day per move on a single core machine (or
equivalent.)
> 
> Don
>   
> 
> On Mon, Jun 20, 2011 at 1:12 PM, René van de Veerdonk <
> [email protected]> wrote:
> On Mon, Jun 20, 2011 at 8:27 AM, Don Dailey <[email protected]> wrote:
> On Mon, Jun 20, 2011 at 10:09 AM, terry mcintyre <[email protected]
> > wrote:
> Any particular instance of a program will probably fail to scale - 
> especially against humans who share the lessons of experience. 
> 
> That is complete nonsense.   How are you backing this up?   What 
> proof do you have that computers don't play better on better 
> hardware?   Why are the top programs being run on clusters and 
> multi-core computers?   Are the authors just complete idiots?   
> 
> Every bit of evidence we have says they are scaling very well 
> against humans.     That has also been our experience in game after 
> game,  not just in Go.     
> 
> I apologize for being so harsh on this,  but you are too smart to be
> saying such dumb things.   
>  
> Please read (from Darren Cook):
> 
> Richard Segal (who operates Blue Fuego) has a paper on the upper limit
> for scaling:
> http://www.springerlink.com/content/b8p81h40129116kl/
> (Sorry, I couldn't find an author's download link for the paper; Richard
> is on the Fuego list but I'm not sure he is even a lurker here.)
> 
> Another paper that mentions the topic (also Fuego specific) is here:
> http://webdocs.cs.ualberta.ca/~mmueller/ps/fuego-TCIAIG.pdf
> 
> The short conclusion of these papers is that it was not worth it to 
> scale Fuego to a IBM/BlueGene machine (original purpose of the 
> work), as it didn't get any better anymore, i.e., performance didn't
> (appreciably) scale beyond 512 or so cores. Obviously, that's still 
> a big improvement from a desktop pc, but far from the capacity of 
> the supercomputer available.
> 
> Zen also has a similar issue according to its co-author Kideko Kato.
> That's evidence enough for me to indicate that there indeed is an 
> issue with the current breed of programs.
> 
> It appears that current specific implementations have a scaling 
> ceiling. I expect, as expressed by multiple people here, that once 
> that ceiling gets close enough to become a limitation, people will 
> change their algorithms accordingly and scaling will continue. But 
> it will only happen once you can comfortably test the program in 
> that resource regime.
> 
> René
> 
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> 
> 
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

 


_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

 

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to