Also, this rating is calculated by self-play, so it overstates the true rating. The asymptotic rating is achievable with todays hardware, but fuego on that many cores (my assumption) is not as strong as a professional go player.
David From: [email protected] [mailto:[email protected]] On Behalf Of Andy Sent: Monday, June 20, 2011 1:35 PM To: [email protected] Subject: Re: [Computer-go] Thoughts about bugs and scalability It's more than 100 vanilla Elo points per rank. The KGS rating algorithm sets 1 rank as: ~139 for 30k-5k (70% win rate expected in even games between 5k and 6k) ~226 for 2d+ (79% win rate expected in even games between 2d and 3d) (Sliding scale between 5k and 2d.) EGF uses a similar system to slide the winning percentages 80% win rate expected between 6d and 7d 64% win rate expected between 29k and 30k On Mon, Jun 20, 2011 at 3:20 PM, Don Dailey <[email protected]> wrote: Richard, You said the performance asymptote is far from perfect play, but your projection in the paper is an ELO of 4154 ABOVE the level of Fuego playing at 1 minute per game. It has been estimated that 100 ELO is about 1 Dan at the low kyu high dan levels and even less at higher levels. This implies by your own calculation that there are at LEAST 41 ranks between the 1 minute Feugo player and asymptote that you estimated. That "back of the envelope" calculation is crude I admit but it seems to indicate that an ELO rating of 4154 compared to Fuego at 1 minute is FAR above the best player. So what is the basis of your statement that the asymptote is "far from perfect play?" I must have missed this in the paper as I read it quickly, but I did not see any claims in the paper about this, which I considered a good things since we don't know where perfect play is. Don The implication of that and your own calculation is that there are 41 ranks between the 1 minute Fuego and the asymptote. On Mon, Jun 20, 2011 at 4:03 PM, Richard B Segal <[email protected]> wrote: Hi All, As my scalability paper has come up several times in the last few weeks I thought it would be useful to give a brief overview of the results relevant to the current conversation and provide additional insights. The first set of experiments analyzed the scalability of Fuego on a single core machine without multithreading. The experiments consisted of 1,000 self-play games where one player received N minutes for all moves, and the second player receiving 2*N minutes for all moves. The resulting scaling graph looks qualitatively similar to the recent scaling graph posted for Pachi. The important observation is that the ELO gained between each successive doubling decreases as search time increases. If we assume this trend continues, then the ELO gained for each doubling will converge to zero forcing absolute performance to level off. Ideally, this asymptote would not occur until perfect play is reached. Unfortunately, for Fuego, fitting an exponential curve to the data suggests a performance asymptote that is far from perfect play. This disappointing result raises lots of questions and caveats. The results are specific to Fuego. I suspect all programs will display a similar shaped scaling curve and asymptote. However, better designed programs should asymptote to better performance. The best programs may asymptote to perfect play. The results are for self-play with all its inherent limitations. However, I suspect self-play will overestimate true scaling, so the results are still probably an upper bound on Fuego's true performance. Sufficient memory was available and did not affect the results. All results are for 19x19 play. The level of play that Fuego appears to asymptote is well below expert play and may even be below the performance of the top programs such as Zen and Erica. When Fuego is run our 32 core / 128 thread competition machine, the above analysis suggests Fuego is within 500 ELO of its maximum possible play quality. My (almost random) guess is that on competition hardware Fuego is about 200 to 300 elo weaker than Zen and Erica on competition hardware. The big question is why this asymptote exists at such a weak level of play. Unfortunately, I do not have a definitive answer to this question. Some important observations. I agree with much of the earlier discussion that any well written MCTS implementation will eventually converge on optimal play. I believe I can prove that any MCTS implementation that meets the following criteria: 1) Eventually includes all possible legal move sequences in the tree. 2) Correctly scores terminal positions 3) Selects the highest scoring node an infinite number of times 4) And never lets a node's score drop to zero unless its proven a loss. will eventually achieve optimal play. The above applies to traditional UCT, greedy versions of UCT without an exploration term, and most RAVE variants. Note, I suspect there is some confusion about how quickly the above converges. The convergence of MCTS is much worse than linear in the size of the search space. What happens when a move looks good for one million trials and then a refuting response is found? MCTS may need to revisit the refuting node millions of times to reduce the score of the parent move to a low enough value for an alternative move to be preferred. MCTS has been shown to have a worst case convergence of 2^(2^(2^....) where the number of exponentiations is linear in the tree depth* My guess as to why the asymptote exists at such a weak level of play is a combination of problems with Fuego and the scalability of MCTS as I describe above. The scaling of MCTS is so poor that it may be impossible from simple curve fitting to distinguish an asymptote and an extremely slowly improving curve. I suspect the reason that Fuego hits its asymptote so early in the play quality curve is due to limits in its design. Programs that can get more value out of a fixed sized tree can achieve higher levels of play before the awful scaling of MCTS overtakes and prevents further practical improvement. - Rich * P. Coquelin and R. Munos. Bandit Algorithms for Tree Search. UAI 2007. [email protected] wrote on 06/20/2011 02:23:48 PM: > [image removed] > > Re: [Computer-go] Thoughts about bugs and scalability > > René van de Veerdonk > > to: > > computer-go > > 06/20/2011 02:24 PM > > Sent by: > > [email protected] > > Please respond to computer-go > > I am surprised you were asked to pay for the paper, I was not asked > to pay for the download (that's why I felt okay to provide the link). > > From the conclusion: > "We first analyzed the single-threaded scaling of Fuego and found > that there is an upper bound on the play-quality improvements that > can come from additional search. This in itself limits the potential > advantage of Blue Gene/P as its total processing power in a one-hour > timed game easily exceeds the computation needed to maximize Fuegos > overall play quality." > > René > > On Mon, Jun 20, 2011 at 10:35 AM, Don Dailey <[email protected]> wrote: > First of all, I never purchase papers of unknown quality. If I > know a paper is great or that it comes highly recommended by someone > I trust I will purchase it although the idea of purchasing papers in > general is distasteful to me. > > The scalability issue as you summarize has to do with the number of > cores - I am not surprised that more cores bring less improvement > and that is true in ALL games. That's not the issue I am talking > about here although it's certain a legitimate practical issue. > > To better understand the issue, forget about the specific machine > it's on and substitute the phrase "scale with time." > > There is no scaling issue with reasonably written programs that can > be tested with current hardware. Showing that it's difficult to > parallelize an algorithm that is basically serial does not prove > anything here. > > I have explained why people believe MCTS may have limits and why > they are wrong. All such "evidence" that we have hit some wall is > anecdotal. Unless of course you know of someone who has run a few > thousands games at 1 day per move on a single core machine (or equivalent.) > > Don > > > On Mon, Jun 20, 2011 at 1:12 PM, René van de Veerdonk < > [email protected]> wrote: > On Mon, Jun 20, 2011 at 8:27 AM, Don Dailey <[email protected]> wrote: > On Mon, Jun 20, 2011 at 10:09 AM, terry mcintyre <[email protected] > > wrote: > Any particular instance of a program will probably fail to scale - > especially against humans who share the lessons of experience. > > That is complete nonsense. How are you backing this up? What > proof do you have that computers don't play better on better > hardware? Why are the top programs being run on clusters and > multi-core computers? Are the authors just complete idiots? > > Every bit of evidence we have says they are scaling very well > against humans. That has also been our experience in game after > game, not just in Go. > > I apologize for being so harsh on this, but you are too smart to be > saying such dumb things. > > Please read (from Darren Cook): > > Richard Segal (who operates Blue Fuego) has a paper on the upper limit > for scaling: > http://www.springerlink.com/content/b8p81h40129116kl/ > (Sorry, I couldn't find an author's download link for the paper; Richard > is on the Fuego list but I'm not sure he is even a lurker here.) > > Another paper that mentions the topic (also Fuego specific) is here: > http://webdocs.cs.ualberta.ca/~mmueller/ps/fuego-TCIAIG.pdf > > The short conclusion of these papers is that it was not worth it to > scale Fuego to a IBM/BlueGene machine (original purpose of the > work), as it didn't get any better anymore, i.e., performance didn't > (appreciably) scale beyond 512 or so cores. Obviously, that's still > a big improvement from a desktop pc, but far from the capacity of > the supercomputer available. > > Zen also has a similar issue according to its co-author Kideko Kato. > That's evidence enough for me to indicate that there indeed is an > issue with the current breed of programs. > > It appears that current specific implementations have a scaling > ceiling. I expect, as expressed by multiple people here, that once > that ceiling gets close enough to become a limitation, people will > change their algorithms accordingly and scaling will continue. But > it will only happen once you can comfortably test the program in > that resource regime. > > René > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
