Re: [Computer-go] Thoughts about bugs and scalability

Don Dailey Mon, 20 Jun 2011 13:54:30 -0700

On Mon, Jun 20, 2011 at 4:49 PM, David Fotland <[email protected]>wrote:


> Also, this rating is calculated by self-play, so it overstates the true
> rating.  The asymptotic rating is achievable with today’s hardware, but
> fuego on that many cores (my assumption) is not as strong as a professional
> go player.
>

Yes,  I completely agree.    Self-testing (actually I think this is more
about computer/computer testing)  is subject to what I call "rating
expansion"  which actually is a good thing for measuring small improvements.


Don





> ****
>
> ** **
>
> David****
>
> ** **
>
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Andy
> *Sent:* Monday, June 20, 2011 1:35 PM
> *To:* [email protected]
> *Subject:* Re: [Computer-go] Thoughts about bugs and scalability****
>
> ** **
>
> It's more than 100 vanilla Elo points per rank.****
>
> ** **
>
> The KGS rating algorithm sets 1 rank as:****
>
> ~139 for 30k-5k  (70% win rate expected in even games between 5k and 6k)**
> **
>
> ~226 for 2d+      (79% win rate expected in even games between 2d and 3d)*
> ***
>
> (Sliding scale between 5k and 2d.)****
>
> ** **
>
> EGF uses a similar system to slide the winning percentages****
>
> 80% win rate expected between 6d and 7d****
>
> 64% win rate expected between 29k and 30k ****
>
> ** **
>
> On Mon, Jun 20, 2011 at 3:20 PM, Don Dailey <[email protected]> wrote:*
> ***
>
> Richard,****
>
> ** **
>
> You said the performance asymptote is far from perfect play,  but your
> projection in the paper is an  ELO of  4154 ABOVE the level of Fuego playing
> at 1 minute per game.   ****
>
>  ****
>
> It has been estimated that 100 ELO is about 1 Dan at the low kyu high dan
> levels and even less at higher levels.    This implies by your own
> calculation that there are at LEAST 41 ranks between the 1 minute Feugo
> player and asymptote that you estimated.   ****
>
> ** **
>
> That "back of the envelope" calculation is crude I admit but it seems to
> indicate that an ELO rating of 4154 compared to Fuego at 1 minute is FAR
> above the best player.   ****
>
> ** **
>
> So what is the basis of your statement that the asymptote is "far from
> perfect play?"****
>
> ** **
>
> I must have missed this in the paper as I read it quickly,  but I did not
> see any claims in the paper about this,  which I considered a good things
> since we don't know where perfect play is. ****
>
> ** **
>
> Don****
>
> ** **
>
> ** **
>
> ** **
>
> The implication of that and your own calculation is that there are 41 ranks
> between the 1 minute Fuego and the asymptote.   ****
>
> ** **
>
> On Mon, Jun 20, 2011 at 4:03 PM, Richard B Segal <[email protected]>
> wrote:****
>
> Hi All,
>
> As my scalability paper has come up several times in the last few weeks I
> thought it would be useful to give a brief overview of the results relevant
> to the current conversation and provide additional insights.
>
> The first set of experiments analyzed the scalability of Fuego on a single
> core machine without multithreading. The experiments consisted of 1,000
> self-play games where one player received N minutes for all moves, and the
> second player receiving 2*N minutes for all moves. The resulting scaling
> graph looks qualitatively similar to the recent scaling graph posted for
> Pachi. The important observation is that the ELO gained between each
> successive doubling decreases as search time increases. If we assume this
> trend continues, then the ELO gained for each doubling will converge to zero
> forcing absolute performance to level off. Ideally, this asymptote would not
> occur until perfect play is reached. Unfortunately, for Fuego, fitting an
> exponential curve to the data suggests a performance asymptote that is far
> from perfect play.
>
> This disappointing result raises lots of questions and caveats. The results
> are specific to Fuego. I suspect all programs will display a similar shaped
> scaling curve and asymptote. However, better designed programs should
> asymptote to better performance. The best programs may asymptote to perfect
> play. The results are for self-play with all its inherent limitations.
> However, I suspect self-play will overestimate true scaling, so the results
> are still probably an upper bound on Fuego's true performance. Sufficient
> memory was available and did not affect the results. All results are for
> 19x19 play.
>
> The level of play that Fuego appears to asymptote is well below expert play
> and may even be below the performance of the top programs such as Zen and
> Erica. When Fuego is run our 32 core / 128 thread competition machine, the
> above analysis suggests Fuego is within 500 ELO of its maximum possible play
> quality. My (almost random) guess is that on competition hardware Fuego is
> about 200 to 300 elo weaker than Zen and Erica on competition hardware.
>
> The big question is why this asymptote exists at such a weak level of play.
> Unfortunately, I do not have a definitive answer to this question. Some
> important observations. I agree with much of the earlier discussion that any
> well written MCTS implementation will eventually converge on optimal play. I
> believe I can prove that any MCTS implementation that meets the following
> criteria:
>
> 1) Eventually includes all possible legal move sequences in the tree.
> 2) Correctly scores terminal positions
> 3) Selects the highest scoring node an infinite number of times
> 4) And never lets a node's score drop to zero unless its proven a loss.
>
> will eventually achieve optimal play. The above applies to traditional UCT,
> greedy versions of UCT without an exploration term, and most RAVE variants.
> Note, I suspect there is some confusion about how quickly the above
> converges. The convergence of MCTS is much worse than linear in the size of
> the search space. What happens when a move looks good for one million trials
> and then a refuting response is found? MCTS may need to revisit the refuting
> node millions of times to reduce the score of the parent move to a low
> enough value for an alternative move to be preferred. MCTS has been shown to
> have a worst case convergence of 2^(2^(2^....) where the number of
> exponentiations is linear in the tree depth*
>
> My guess as to why the asymptote exists at such a weak level of play is a
> combination of problems with Fuego and the scalability of MCTS as I describe
> above. The scaling of MCTS is so poor that it may be impossible from simple
> curve fitting to distinguish an asymptote and an extremely slowly improving
> curve. I suspect the reason that Fuego hits its asymptote so early in the
> play quality curve is due to limits in its design. Programs that can get
> more value out of a fixed sized tree can achieve higher levels of play
> before the awful scaling of MCTS overtakes and prevents further practical
> improvement.
>
> - Rich
>
>
> * P. Coquelin and R. Munos. Bandit Algorithms for Tree Search. UAI 2007.
>
>
>
> [email protected] wrote on 06/20/2011 02:23:48 PM:
>
> > [image removed]
> >
> > Re: [Computer-go] Thoughts about bugs and scalability
> >
> > René van de Veerdonk
> >
> > to:
> >
> > computer-go
> >
> > 06/20/2011 02:24 PM
> >
> > Sent by:
> >
> > [email protected]
> >
> > Please respond to computer-go****
>
>
> >
> > I am surprised you were asked to pay for the paper, I was not asked
> > to pay for the download (that's why I felt okay to provide the link).
> >
> > From the conclusion:
> > "We first analyzed the single-threaded scaling of Fuego and found
> > that there is an upper bound on the play-quality improvements that
> > can come from additional search. This in itself limits the potential
> > advantage of Blue Gene/P as its total processing power in a one-hour
> > timed game easily exceeds the computation needed to maximize Fuego’s
> > overall play quality."
> >
> > René
> >
> > On Mon, Jun 20, 2011 at 10:35 AM, Don Dailey <[email protected]>
> wrote:
> > First of all,  I never purchase papers of unknown quality.   If I
> > know a paper is great or that it comes highly recommended by someone
> > I trust I will purchase it although the idea of purchasing papers in
> > general is distasteful to me.
> >
> > The scalability issue as you summarize has to do with the number of
> > cores - I am not surprised that more cores bring less improvement
> > and that is true in ALL games.    That's not the issue I am talking
> > about here although it's certain a legitimate practical issue.
> >
> > To better understand the issue,  forget about the specific machine
> > it's on and substitute the phrase "scale with time."
> >
> > There is no scaling issue with reasonably written programs that can
> > be tested with current hardware.    Showing that it's difficult to
> > parallelize an algorithm that is basically serial does not prove
> > anything here.
> >
> > I have explained why people believe MCTS may have limits and why
> > they are wrong.   All such "evidence" that we have hit some wall is
> > anecdotal.   Unless of course you know of someone who has run a few
> > thousands games at 1 day per move on a single core machine (or
> equivalent.)
> >
> > Don
> >
> >
> > On Mon, Jun 20, 2011 at 1:12 PM, René van de Veerdonk <
> > [email protected]> wrote:
> > On Mon, Jun 20, 2011 at 8:27 AM, Don Dailey <[email protected]>
> wrote:
> > On Mon, Jun 20, 2011 at 10:09 AM, terry mcintyre <
> [email protected]
> > > wrote:
> > Any particular instance of a program will probably fail to scale -
> > especially against humans who share the lessons of experience.
> >
> > That is complete nonsense.   How are you backing this up?   What
> > proof do you have that computers don't play better on better
> > hardware?   Why are the top programs being run on clusters and
> > multi-core computers?   Are the authors just complete idiots?
> >
> > Every bit of evidence we have says they are scaling very well
> > against humans.     That has also been our experience in game after
> > game,  not just in Go.
> >
> > I apologize for being so harsh on this,  but you are too smart to be
> > saying such dumb things.
> >
> > Please read (from Darren Cook):
> >
> > Richard Segal (who operates Blue Fuego) has a paper on the upper limit
> > for scaling:
> > http://www.springerlink.com/content/b8p81h40129116kl/
> > (Sorry, I couldn't find an author's download link for the paper; Richard
> > is on the Fuego list but I'm not sure he is even a lurker here.)
> >
> > Another paper that mentions the topic (also Fuego specific) is here:
> > http://webdocs.cs.ualberta.ca/~mmueller/ps/fuego-TCIAIG.pdf
> >
> > The short conclusion of these papers is that it was not worth it to
> > scale Fuego to a IBM/BlueGene machine (original purpose of the
> > work), as it didn't get any better anymore, i.e., performance didn't
> > (appreciably) scale beyond 512 or so cores. Obviously, that's still
> > a big improvement from a desktop pc, but far from the capacity of
> > the supercomputer available.
> >
> > Zen also has a similar issue according to its co-author Kideko Kato.
> > That's evidence enough for me to indicate that there indeed is an
> > issue with the current breed of programs.
> >
> > It appears that current specific implementations have a scaling
> > ceiling. I expect, as expressed by multiple people here, that once
> > that ceiling gets close enough to become a limitation, people will
> > change their algorithms accordingly and scaling will continue. But
> > it will only happen once you can comfortably test the program in
> > that resource regime.
> >
> > René
> >
> > _______________________________________________
> > Computer-go mailing list
> > [email protected]
> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >
> >
> > _______________________________________________
> > Computer-go mailing list
> > [email protected]
> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> > _______________________________________________
> > Computer-go mailing list
> > [email protected]
> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go****
>
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go****
>
> ** **
>
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go****
>
> ** **
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] Thoughts about bugs and scalability

Reply via email to