Re: [Computer-go] Thoughts about bugs and scalability

Andy Mon, 20 Jun 2011 13:34:48 -0700

It's more than 100 vanilla Elo points per rank.

The KGS rating algorithm sets 1 rank as:
~139 for 30k-5k  (70% win rate expected in even games between 5k and 6k)
~226 for 2d+      (79% win rate expected in even games between 2d and 3d)
(Sliding scale between 5k and 2d.)


EGF uses a similar system to slide the winning percentages
80% win rate expected between 6d and 7d
64% win rate expected between 29k and 30k


On Mon, Jun 20, 2011 at 3:20 PM, Don Dailey <[email protected]> wrote:

> Richard,
>
> You said the performance asymptote is far from perfect play,  but your
> projection in the paper is an  ELO of  4154 ABOVE the level of Fuego playing
> at 1 minute per game.
>
> It has been estimated that 100 ELO is about 1 Dan at the low kyu high dan
> levels and even less at higher levels.    This implies by your own
> calculation that there are at LEAST 41 ranks between the 1 minute Feugo
> player and asymptote that you estimated.
>
> That "back of the envelope" calculation is crude I admit but it seems to
> indicate that an ELO rating of 4154 compared to Fuego at 1 minute is FAR
> above the best player.
>
> So what is the basis of your statement that the asymptote is "far from
> perfect play?"
>
> I must have missed this in the paper as I read it quickly,  but I did not
> see any claims in the paper about this,  which I considered a good things
> since we don't know where perfect play is.
>
> Don
>
>
>
> The implication of that and your own calculation is that there are 41 ranks
> between the 1 minute Fuego and the asymptote.
>
> On Mon, Jun 20, 2011 at 4:03 PM, Richard B Segal <[email protected]>wrote:
>
>>  Hi All,
>>
>> As my scalability paper has come up several times in the last few weeks I
>> thought it would be useful to give a brief overview of the results relevant
>> to the current conversation and provide additional insights.
>>
>> The first set of experiments analyzed the scalability of Fuego on a single
>> core machine without multithreading. The experiments consisted of 1,000
>> self-play games where one player received N minutes for all moves, and the
>> second player receiving 2*N minutes for all moves. The resulting scaling
>> graph looks qualitatively similar to the recent scaling graph posted for
>> Pachi. The important observation is that the ELO gained between each
>> successive doubling decreases as search time increases. If we assume this
>> trend continues, then the ELO gained for each doubling will converge to zero
>> forcing absolute performance to level off. Ideally, this asymptote would not
>> occur until perfect play is reached. Unfortunately, for Fuego, fitting an
>> exponential curve to the data suggests a performance asymptote that is far
>> from perfect play.
>>
>> This disappointing result raises lots of questions and caveats. The
>> results are specific to Fuego. I suspect all programs will display a similar
>> shaped scaling curve and asymptote. However, better designed programs should
>> asymptote to better performance. The best programs may asymptote to perfect
>> play. The results are for self-play with all its inherent limitations.
>> However, I suspect self-play will overestimate true scaling, so the results
>> are still probably an upper bound on Fuego's true performance. Sufficient
>> memory was available and did not affect the results. All results are for
>> 19x19 play.
>>
>> The level of play that Fuego appears to asymptote is well below expert
>> play and may even be below the performance of the top programs such as Zen
>> and Erica. When Fuego is run our 32 core / 128 thread competition machine,
>> the above analysis suggests Fuego is within 500 ELO of its maximum possible
>> play quality. My (almost random) guess is that on competition hardware Fuego
>> is about 200 to 300 elo weaker than Zen and Erica on competition hardware.
>>
>> The big question is why this asymptote exists at such a weak level of
>> play. Unfortunately, I do not have a definitive answer to this question.
>> Some important observations. I agree with much of the earlier discussion
>> that any well written MCTS implementation will eventually converge on
>> optimal play. I believe I can prove that any MCTS implementation that meets
>> the following criteria:
>>
>>  1) Eventually includes all possible legal move sequences in the tree.
>>  2) Correctly scores terminal positions
>>  3) Selects the highest scoring node an infinite number of times
>>  4) And never lets a node's score drop to zero unless its proven a loss.
>>
>> will eventually achieve optimal play. The above applies to traditional
>> UCT, greedy versions of UCT without an exploration term, and most RAVE
>> variants. Note, I suspect there is some confusion about how quickly the
>> above converges. The convergence of MCTS is much worse than linear in the
>> size of the search space. What happens when a move looks good for one
>> million trials and then a refuting response is found? MCTS may need to
>> revisit the refuting node millions of times to reduce the score of the
>> parent move to a low enough value for an alternative move to be preferred.
>> MCTS has been shown to have a worst case convergence of 2^(2^(2^....) where
>> the number of exponentiations is linear in the tree depth*
>>
>> My guess as to why the asymptote exists at such a weak level of play is a
>> combination of problems with Fuego and the scalability of MCTS as I describe
>> above. The scaling of MCTS is so poor that it may be impossible from simple
>> curve fitting to distinguish an asymptote and an extremely slowly improving
>> curve. I suspect the reason that Fuego hits its asymptote so early in the
>> play quality curve is due to limits in its design. Programs that can get
>> more value out of a fixed sized tree can achieve higher levels of play
>> before the awful scaling of MCTS overtakes and prevents further practical
>> improvement.
>>
>> - Rich
>>
>>
>> * P. Coquelin and R. Munos. Bandit Algorithms for Tree Search. UAI 2007.
>>
>>
>>
>> [email protected] wrote on 06/20/2011 02:23:48 PM:
>>
>> > [image removed]
>> >
>> > Re: [Computer-go] Thoughts about bugs and scalability
>> >
>> > René van de Veerdonk
>> >
>> > to:
>> >
>> > computer-go
>> >
>> > 06/20/2011 02:24 PM
>> >
>> > Sent by:
>> >
>> > [email protected]
>> >
>> > Please respond to computer-go
>>
>> >
>> > I am surprised you were asked to pay for the paper, I was not asked
>> > to pay for the download (that's why I felt okay to provide the link).
>> >
>> > From the conclusion:
>> > "We first analyzed the single-threaded scaling of Fuego and found
>> > that there is an upper bound on the play-quality improvements that
>> > can come from additional search. This in itself limits the potential
>> > advantage of Blue Gene/P as its total processing power in a one-hour
>> > timed game easily exceeds the computation needed to maximize Fuego’s
>> > overall play quality."
>> >
>> > René
>> >
>> > On Mon, Jun 20, 2011 at 10:35 AM, Don Dailey <[email protected]>
>> wrote:
>> > First of all,  I never purchase papers of unknown quality.   If I
>> > know a paper is great or that it comes highly recommended by someone
>> > I trust I will purchase it although the idea of purchasing papers in
>> > general is distasteful to me.
>> >
>> > The scalability issue as you summarize has to do with the number of
>> > cores - I am not surprised that more cores bring less improvement
>> > and that is true in ALL games.    That's not the issue I am talking
>> > about here although it's certain a legitimate practical issue.
>> >
>> > To better understand the issue,  forget about the specific machine
>> > it's on and substitute the phrase "scale with time."
>> >
>> > There is no scaling issue with reasonably written programs that can
>> > be tested with current hardware.    Showing that it's difficult to
>> > parallelize an algorithm that is basically serial does not prove
>> > anything here.
>> >
>> > I have explained why people believe MCTS may have limits and why
>> > they are wrong.   All such "evidence" that we have hit some wall is
>> > anecdotal.   Unless of course you know of someone who has run a few
>> > thousands games at 1 day per move on a single core machine (or
>> equivalent.)
>> >
>> > Don
>> >
>> >
>> > On Mon, Jun 20, 2011 at 1:12 PM, René van de Veerdonk <
>> > [email protected]> wrote:
>> > On Mon, Jun 20, 2011 at 8:27 AM, Don Dailey <[email protected]>
>> wrote:
>> > On Mon, Jun 20, 2011 at 10:09 AM, terry mcintyre <
>> [email protected]
>> > > wrote:
>> > Any particular instance of a program will probably fail to scale -
>> > especially against humans who share the lessons of experience.
>> >
>> > That is complete nonsense.   How are you backing this up?   What
>> > proof do you have that computers don't play better on better
>> > hardware?   Why are the top programs being run on clusters and
>> > multi-core computers?   Are the authors just complete idiots?
>> >
>> > Every bit of evidence we have says they are scaling very well
>> > against humans.     That has also been our experience in game after
>> > game,  not just in Go.
>> >
>> > I apologize for being so harsh on this,  but you are too smart to be
>> > saying such dumb things.
>> >
>> > Please read (from Darren Cook):
>> >
>> > Richard Segal (who operates Blue Fuego) has a paper on the upper limit
>> > for scaling:
>> > http://www.springerlink.com/content/b8p81h40129116kl/
>> > (Sorry, I couldn't find an author's download link for the paper; Richard
>> > is on the Fuego list but I'm not sure he is even a lurker here.)
>> >
>> > Another paper that mentions the topic (also Fuego specific) is here:
>> > http://webdocs.cs.ualberta.ca/~mmueller/ps/fuego-TCIAIG.pdf
>> >
>> > The short conclusion of these papers is that it was not worth it to
>> > scale Fuego to a IBM/BlueGene machine (original purpose of the
>> > work), as it didn't get any better anymore, i.e., performance didn't
>> > (appreciably) scale beyond 512 or so cores. Obviously, that's still
>> > a big improvement from a desktop pc, but far from the capacity of
>> > the supercomputer available.
>> >
>> > Zen also has a similar issue according to its co-author Kideko Kato.
>> > That's evidence enough for me to indicate that there indeed is an
>> > issue with the current breed of programs.
>> >
>> > It appears that current specific implementations have a scaling
>> > ceiling. I expect, as expressed by multiple people here, that once
>> > that ceiling gets close enough to become a limitation, people will
>> > change their algorithms accordingly and scaling will continue. But
>> > it will only happen once you can comfortably test the program in
>> > that resource regime.
>> >
>> > René
>> >
>> > _______________________________________________
>> > Computer-go mailing list
>> > [email protected]
>> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> >
>> >
>> > _______________________________________________
>> > Computer-go mailing list
>> > [email protected]
>> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> > _______________________________________________
>> > Computer-go mailing list
>> > [email protected]
>> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>
>>
>> _______________________________________________
>> Computer-go mailing list
>> [email protected]
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>
>
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] Thoughts about bugs and scalability

Reply via email to