It's more than 100 vanilla Elo points per rank. The KGS rating algorithm sets 1 rank as: ~139 for 30k-5k (70% win rate expected in even games between 5k and 6k) ~226 for 2d+ (79% win rate expected in even games between 2d and 3d) (Sliding scale between 5k and 2d.)
EGF uses a similar system to slide the winning percentages 80% win rate expected between 6d and 7d 64% win rate expected between 29k and 30k On Mon, Jun 20, 2011 at 3:20 PM, Don Dailey <[email protected]> wrote: > Richard, > > You said the performance asymptote is far from perfect play, but your > projection in the paper is an ELO of 4154 ABOVE the level of Fuego playing > at 1 minute per game. > > It has been estimated that 100 ELO is about 1 Dan at the low kyu high dan > levels and even less at higher levels. This implies by your own > calculation that there are at LEAST 41 ranks between the 1 minute Feugo > player and asymptote that you estimated. > > That "back of the envelope" calculation is crude I admit but it seems to > indicate that an ELO rating of 4154 compared to Fuego at 1 minute is FAR > above the best player. > > So what is the basis of your statement that the asymptote is "far from > perfect play?" > > I must have missed this in the paper as I read it quickly, but I did not > see any claims in the paper about this, which I considered a good things > since we don't know where perfect play is. > > Don > > > > The implication of that and your own calculation is that there are 41 ranks > between the 1 minute Fuego and the asymptote. > > On Mon, Jun 20, 2011 at 4:03 PM, Richard B Segal <[email protected]>wrote: > >> Hi All, >> >> As my scalability paper has come up several times in the last few weeks I >> thought it would be useful to give a brief overview of the results relevant >> to the current conversation and provide additional insights. >> >> The first set of experiments analyzed the scalability of Fuego on a single >> core machine without multithreading. The experiments consisted of 1,000 >> self-play games where one player received N minutes for all moves, and the >> second player receiving 2*N minutes for all moves. The resulting scaling >> graph looks qualitatively similar to the recent scaling graph posted for >> Pachi. The important observation is that the ELO gained between each >> successive doubling decreases as search time increases. If we assume this >> trend continues, then the ELO gained for each doubling will converge to zero >> forcing absolute performance to level off. Ideally, this asymptote would not >> occur until perfect play is reached. Unfortunately, for Fuego, fitting an >> exponential curve to the data suggests a performance asymptote that is far >> from perfect play. >> >> This disappointing result raises lots of questions and caveats. The >> results are specific to Fuego. I suspect all programs will display a similar >> shaped scaling curve and asymptote. However, better designed programs should >> asymptote to better performance. The best programs may asymptote to perfect >> play. The results are for self-play with all its inherent limitations. >> However, I suspect self-play will overestimate true scaling, so the results >> are still probably an upper bound on Fuego's true performance. Sufficient >> memory was available and did not affect the results. All results are for >> 19x19 play. >> >> The level of play that Fuego appears to asymptote is well below expert >> play and may even be below the performance of the top programs such as Zen >> and Erica. When Fuego is run our 32 core / 128 thread competition machine, >> the above analysis suggests Fuego is within 500 ELO of its maximum possible >> play quality. My (almost random) guess is that on competition hardware Fuego >> is about 200 to 300 elo weaker than Zen and Erica on competition hardware. >> >> The big question is why this asymptote exists at such a weak level of >> play. Unfortunately, I do not have a definitive answer to this question. >> Some important observations. I agree with much of the earlier discussion >> that any well written MCTS implementation will eventually converge on >> optimal play. I believe I can prove that any MCTS implementation that meets >> the following criteria: >> >> 1) Eventually includes all possible legal move sequences in the tree. >> 2) Correctly scores terminal positions >> 3) Selects the highest scoring node an infinite number of times >> 4) And never lets a node's score drop to zero unless its proven a loss. >> >> will eventually achieve optimal play. The above applies to traditional >> UCT, greedy versions of UCT without an exploration term, and most RAVE >> variants. Note, I suspect there is some confusion about how quickly the >> above converges. The convergence of MCTS is much worse than linear in the >> size of the search space. What happens when a move looks good for one >> million trials and then a refuting response is found? MCTS may need to >> revisit the refuting node millions of times to reduce the score of the >> parent move to a low enough value for an alternative move to be preferred. >> MCTS has been shown to have a worst case convergence of 2^(2^(2^....) where >> the number of exponentiations is linear in the tree depth* >> >> My guess as to why the asymptote exists at such a weak level of play is a >> combination of problems with Fuego and the scalability of MCTS as I describe >> above. The scaling of MCTS is so poor that it may be impossible from simple >> curve fitting to distinguish an asymptote and an extremely slowly improving >> curve. I suspect the reason that Fuego hits its asymptote so early in the >> play quality curve is due to limits in its design. Programs that can get >> more value out of a fixed sized tree can achieve higher levels of play >> before the awful scaling of MCTS overtakes and prevents further practical >> improvement. >> >> - Rich >> >> >> * P. Coquelin and R. Munos. Bandit Algorithms for Tree Search. UAI 2007. >> >> >> >> [email protected] wrote on 06/20/2011 02:23:48 PM: >> >> > [image removed] >> > >> > Re: [Computer-go] Thoughts about bugs and scalability >> > >> > René van de Veerdonk >> > >> > to: >> > >> > computer-go >> > >> > 06/20/2011 02:24 PM >> > >> > Sent by: >> > >> > [email protected] >> > >> > Please respond to computer-go >> >> > >> > I am surprised you were asked to pay for the paper, I was not asked >> > to pay for the download (that's why I felt okay to provide the link). >> > >> > From the conclusion: >> > "We first analyzed the single-threaded scaling of Fuego and found >> > that there is an upper bound on the play-quality improvements that >> > can come from additional search. This in itself limits the potential >> > advantage of Blue Gene/P as its total processing power in a one-hour >> > timed game easily exceeds the computation needed to maximize Fuego’s >> > overall play quality." >> > >> > René >> > >> > On Mon, Jun 20, 2011 at 10:35 AM, Don Dailey <[email protected]> >> wrote: >> > First of all, I never purchase papers of unknown quality. If I >> > know a paper is great or that it comes highly recommended by someone >> > I trust I will purchase it although the idea of purchasing papers in >> > general is distasteful to me. >> > >> > The scalability issue as you summarize has to do with the number of >> > cores - I am not surprised that more cores bring less improvement >> > and that is true in ALL games. That's not the issue I am talking >> > about here although it's certain a legitimate practical issue. >> > >> > To better understand the issue, forget about the specific machine >> > it's on and substitute the phrase "scale with time." >> > >> > There is no scaling issue with reasonably written programs that can >> > be tested with current hardware. Showing that it's difficult to >> > parallelize an algorithm that is basically serial does not prove >> > anything here. >> > >> > I have explained why people believe MCTS may have limits and why >> > they are wrong. All such "evidence" that we have hit some wall is >> > anecdotal. Unless of course you know of someone who has run a few >> > thousands games at 1 day per move on a single core machine (or >> equivalent.) >> > >> > Don >> > >> > >> > On Mon, Jun 20, 2011 at 1:12 PM, René van de Veerdonk < >> > [email protected]> wrote: >> > On Mon, Jun 20, 2011 at 8:27 AM, Don Dailey <[email protected]> >> wrote: >> > On Mon, Jun 20, 2011 at 10:09 AM, terry mcintyre < >> [email protected] >> > > wrote: >> > Any particular instance of a program will probably fail to scale - >> > especially against humans who share the lessons of experience. >> > >> > That is complete nonsense. How are you backing this up? What >> > proof do you have that computers don't play better on better >> > hardware? Why are the top programs being run on clusters and >> > multi-core computers? Are the authors just complete idiots? >> > >> > Every bit of evidence we have says they are scaling very well >> > against humans. That has also been our experience in game after >> > game, not just in Go. >> > >> > I apologize for being so harsh on this, but you are too smart to be >> > saying such dumb things. >> > >> > Please read (from Darren Cook): >> > >> > Richard Segal (who operates Blue Fuego) has a paper on the upper limit >> > for scaling: >> > http://www.springerlink.com/content/b8p81h40129116kl/ >> > (Sorry, I couldn't find an author's download link for the paper; Richard >> > is on the Fuego list but I'm not sure he is even a lurker here.) >> > >> > Another paper that mentions the topic (also Fuego specific) is here: >> > http://webdocs.cs.ualberta.ca/~mmueller/ps/fuego-TCIAIG.pdf >> > >> > The short conclusion of these papers is that it was not worth it to >> > scale Fuego to a IBM/BlueGene machine (original purpose of the >> > work), as it didn't get any better anymore, i.e., performance didn't >> > (appreciably) scale beyond 512 or so cores. Obviously, that's still >> > a big improvement from a desktop pc, but far from the capacity of >> > the supercomputer available. >> > >> > Zen also has a similar issue according to its co-author Kideko Kato. >> > That's evidence enough for me to indicate that there indeed is an >> > issue with the current breed of programs. >> > >> > It appears that current specific implementations have a scaling >> > ceiling. I expect, as expressed by multiple people here, that once >> > that ceiling gets close enough to become a limitation, people will >> > change their algorithms accordingly and scaling will continue. But >> > it will only happen once you can comfortably test the program in >> > that resource regime. >> > >> > René >> > >> > _______________________________________________ >> > Computer-go mailing list >> > [email protected] >> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >> > >> > >> > _______________________________________________ >> > Computer-go mailing list >> > [email protected] >> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >> > _______________________________________________ >> > Computer-go mailing list >> > [email protected] >> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >> >> >> _______________________________________________ >> Computer-go mailing list >> [email protected] >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >> > > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
