[Computer-go] Replicating AlphaGo results

Petr Baudis Thu, 28 Jan 2016 08:08:31 -0800

  Hi!

  Since I didn't say that yet, congratulations to DeepMind!


  (I guess I'm a bit disappointed that no really new ML models had to be
invented for this though, I was wondering e.g. about capsule networks or
training simple iterative evaluation subroutines (for semeai etc.) by
NTM-based approaches.  Just like everyone else, color me very awed by
such an astonishing result with just what was presented.)

On Wed, Jan 27, 2016 at 11:15:59PM -0800, David Fotland wrote:
> Google’s breakthrough is just as impactful as the invention of MCTS.  
> Congratulations to the team.  It’s a huge leap for computer go, but more 
> importantly it shows that DNN can be applied to many other difficult problems.
> 
> I just added an answer.  I don’t think anyone will try to exactly replicate 
> it, but a year from now there should be several strong programs using very 
> similar techniques, with similar strength.
> 
> An interesting question is, who has integrated or is integrating a DNN into 
> their go program?  I’m working on it.  I know there are several others.
> 
> David
> 
> From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
> Jason Li
> Sent: Wednesday, January 27, 2016 3:14 PM
> To: computer-go@computer-go.org
> Subject: Re: [Computer-go] Mastering the Game of Go with Deep Neural Networks 
> and Tree Search
> 
> Congratulations to Aja!
> 
> A question to the community. Is anyone going to replicate the experimental 
> results?
> 
> https://www.quora.com/Is-anyone-replicating-the-experimental-results-of-the-human-level-Go-player-published-by-Google-Deepmind-in-Nature-in-January-2016?

  A perfect question, I think - what can we do to replicate this,
without Google's computational power?

  I probably couldn't have resisted giving it a try myself (especially
given that a lot of what I do nowadays are deep NNs, though on NLP),
but thankfully I have two deadlines coming... ;-)

  I'd propose these as the major technical points to consider when
bringing a Go program (or a new one) to an Alpha-Go analog:

  * Asynchronous integration of DNN evaluation with fast MCTS.  I'm
    curious about this, as I thought this would be a much bigger problem
    that it apparently is, based on old results with batch parallelization.
    I guess virtual loss makes a lot of difference?  Is 1 lost playout enough?
    I wonder if Detlef has already solved this sufficiently well in oakfoam?

    What's the typical lag of getting the GPU evaluation (in, I guess,
    #playouts) in oakfoam and is the throughput sufficient to score all
    expanded leaf nodes (what's the #visits?)?  Sorry if this has been
    answered before.

  * Are RL Policy Networks essential?  AIUI by quick reading, they are
    actually used only for RL of the value networks, and based on Fig. 4
    the value network didn't use policy network for training on but still
    got quite stronger than zen/crazystone?  Aside of extra work, this'd
    save us 50 GPU-days.

    (My intuition is that RL policy networks are the part that allows
    embedding knowledge about common tsumego/semeai situations in the
    value networks, because they probably have enough capacity to learn
    them.  Does that make sense?)

  * Seems like the push for SL Policy Network prediction accuracy from
    50% to 60% is really important for real-world strength (Fig. 2).
    I think right now the top open source solution has prediction
    accuracy 50%?  IDK if there's any other factor (features, dataset
    size, training procedure) involved in this than "Updates were
    applied asynchronously on 50 GPUs using DistBelief 60; gradients older
    than 100 steps were discarded. Training took around 3 weeks for 340
    million training steps."

  * Value Networks require (i) 30 million self-play games (!); (ii) 50
    GPU-weeks to train the weights.  This seems rather troublesome, even
    1/10 of that is a bit problematic for individual programmers.  It'd
    be interesting to see how much of that are diminishing returns and
    if a much smaller network on smaller data (+ some compromises like
    sampling the same game a few times, or adding the 8 million tygem
    corpus to the mix) could do something interesting too.

  In summary, seems to me that the big part of why this approach was so
successful are the huge computational resources applied to this, which
is of course an obstacle (except the big IT companies).

  I think the next main avenue of research is exploring solutions that
are much less resource-hungry.  The main problem here is hungry at
training time, not play time.  Well, the strength of this NN running on
a normal single-GPU machine is another big question mark, of course.

                                Petr Baudis
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Replicating AlphaGo results

Reply via email to