Hi All,

As my scalability paper has come up several times in the last few weeks I
thought it would be useful to give a brief overview of the results relevant
to the current conversation and provide additional insights.

The first set of experiments analyzed the scalability of Fuego on a single
core machine without multithreading.   The experiments consisted of 1,000
self-play games where one player received N minutes for all moves, and the
second player receiving 2*N minutes for all moves.   The resulting scaling
graph looks qualitatively similar to the recent scaling graph posted for
Pachi.   The important observation is that the ELO gained between each
successive doubling decreases as search time increases.    If we assume
this trend continues, then the ELO gained for each doubling will converge
to zero forcing absolute performance to level off.  Ideally, this asymptote
would not occur until perfect play is reached.   Unfortunately, for Fuego,
fitting an exponential curve to the data suggests a performance asymptote
that is far from perfect play.

This disappointing result raises lots of questions and caveats.   The
results are specific to Fuego.   I suspect all programs will display a
similar shaped scaling curve and asymptote.   However, better designed
programs should asymptote to better performance.   The best programs may
asymptote to perfect play.  The results are for self-play with all its
inherent limitations.   However, I suspect self-play will overestimate true
scaling, so the results are still probably an upper bound on Fuego's true
performance.  Sufficient memory was available and did not affect the
results.   All results are for 19x19 play.

The level of play that Fuego appears to asymptote is well below expert play
and may even be below the performance of the top programs such as Zen and
Erica.   When Fuego is run our 32 core / 128 thread competition machine,
the above analysis suggests Fuego is within 500 ELO of its maximum possible
play quality.   My (almost random) guess is that on competition hardware
Fuego is about 200 to 300 elo weaker than Zen and Erica on competition
hardware.

The big question is why this asymptote exists at such a weak level of play.
Unfortunately, I do not have a definitive answer to this question.     Some
important observations.   I agree with much of the earlier discussion that
any well written MCTS implementation will eventually converge on optimal
play.     I believe I can prove that any MCTS implementation that meets the
following criteria:

        1) Eventually includes all possible legal move sequences in the tree.
        2) Correctly scores terminal positions
        3) Selects the highest scoring node an infinite number of times
        4) And never lets a node's score drop to zero unless its proven a
loss.

will eventually achieve optimal play.   The above applies to traditional
UCT, greedy versions of UCT without an exploration term, and most RAVE
variants.  Note, I suspect there is some confusion about how quickly the
above converges.   The convergence of MCTS is much worse than linear in the
size of the search space.   What happens when a move looks good for one
million trials and then a refuting response is found?   MCTS may need to
revisit the refuting node millions of times to reduce the score of the
parent move to a low enough value for an alternative move to be preferred.
MCTS has been shown to have a worst case convergence of 2^(2^(2^....) where
the number of exponentiations is linear in the tree depth*

My guess as to why the asymptote exists at such a weak level of play is a
combination of problems with Fuego and the scalability of MCTS as I
describe above.   The scaling of MCTS is so poor that it may be impossible
from simple curve fitting to distinguish an asymptote and an extremely
slowly improving curve.   I suspect the reason that Fuego hits its
asymptote so early in the play quality curve is due to limits in its
design.  Programs that can get more value out of a fixed sized tree can
achieve higher levels of play before the awful scaling of MCTS overtakes
and prevents further practical improvement.

- Rich


* P. Coquelin and R. Munos.   Bandit Algorithms for Tree Search.   UAI
2007.



[email protected] wrote on 06/20/2011 02:23:48 PM:

> [image removed]
>
> Re: [Computer-go] Thoughts about bugs and scalability
>
> René van de Veerdonk
>
> to:
>
> computer-go
>
> 06/20/2011 02:24 PM
>
> Sent by:
>
> [email protected]
>
> Please respond to computer-go
>
> I am surprised you were asked to pay for the paper, I was not asked
> to pay for the download (that's why I felt okay to provide the link).
>
> From the conclusion:
> "We first analyzed the single-threaded scaling of Fuego and found
> that there is an upper bound on the play-quality improvements that
> can come from additional search. This in itself limits the potential
> advantage of Blue Gene/P as its total processing power in a one-hour
> timed game easily exceeds the computation needed to maximize Fuego’s
> overall play quality."
>
> René
>
> On Mon, Jun 20, 2011 at 10:35 AM, Don Dailey <[email protected]>
wrote:
> First of all,  I never purchase papers of unknown quality.   If I
> know a paper is great or that it comes highly recommended by someone
> I trust I will purchase it although the idea of purchasing papers in
> general is distasteful to me.
>
> The scalability issue as you summarize has to do with the number of
> cores - I am not surprised that more cores bring less improvement
> and that is true in ALL games.    That's not the issue I am talking
> about here although it's certain a legitimate practical issue.
>
> To better understand the issue,  forget about the specific machine
> it's on and substitute the phrase "scale with time."
>
> There is no scaling issue with reasonably written programs that can
> be tested with current hardware.    Showing that it's difficult to
> parallelize an algorithm that is basically serial does not prove
> anything here.
>
> I have explained why people believe MCTS may have limits and why
> they are wrong.   All such "evidence" that we have hit some wall is
> anecdotal.   Unless of course you know of someone who has run a few
> thousands games at 1 day per move on a single core machine (or
equivalent.)
>
> Don
>
>
> On Mon, Jun 20, 2011 at 1:12 PM, René van de Veerdonk <
> [email protected]> wrote:
> On Mon, Jun 20, 2011 at 8:27 AM, Don Dailey <[email protected]> wrote:
> On Mon, Jun 20, 2011 at 10:09 AM, terry mcintyre <[email protected]
> > wrote:
> Any particular instance of a program will probably fail to scale -
> especially against humans who share the lessons of experience.
>
> That is complete nonsense.   How are you backing this up?   What
> proof do you have that computers don't play better on better
> hardware?   Why are the top programs being run on clusters and
> multi-core computers?   Are the authors just complete idiots?
>
> Every bit of evidence we have says they are scaling very well
> against humans.     That has also been our experience in game after
> game,  not just in Go.
>
> I apologize for being so harsh on this,  but you are too smart to be
> saying such dumb things.
>
> Please read (from Darren Cook):
>
> Richard Segal (who operates Blue Fuego) has a paper on the upper limit
> for scaling:
> http://www.springerlink.com/content/b8p81h40129116kl/
> (Sorry, I couldn't find an author's download link for the paper; Richard
> is on the Fuego list but I'm not sure he is even a lurker here.)
>
> Another paper that mentions the topic (also Fuego specific) is here:
> http://webdocs.cs.ualberta.ca/~mmueller/ps/fuego-TCIAIG.pdf
>
> The short conclusion of these papers is that it was not worth it to
> scale Fuego to a IBM/BlueGene machine (original purpose of the
> work), as it didn't get any better anymore, i.e., performance didn't
> (appreciably) scale beyond 512 or so cores. Obviously, that's still
> a big improvement from a desktop pc, but far from the capacity of
> the supercomputer available.
>
> Zen also has a similar issue according to its co-author Kideko Kato.
> That's evidence enough for me to indicate that there indeed is an
> issue with the current breed of programs.
>
> It appears that current specific implementations have a scaling
> ceiling. I expect, as expressed by multiple people here, that once
> that ceiling gets close enough to become a limitation, people will
> change their algorithms accordingly and scaling will continue. But
> it will only happen once you can comfortably test the program in
> that resource regime.
>
> René
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to