I agree. Even on 19x19 you can use smaller searches. 400 iterations MCTS is probably already a lot stronger than the raw network, especially if you are expanding every node (very different from a normal program at 400 playouts!). Some tuning of these mini searches is important. Surely you don't want to explore every child node for the first play urgency... I remember this little algorithmic detail was missing from the first paper as well.
So that's a factor 32 gain. Because the network is smaller, it should learn much faster too. Someone on reddit posted a comparison of 20 blocks vs 40 blocks. With 10 people you can probably get some results in a few months. The question is, how much Elo have we lost on the way... Another advantage would be that, as long as you keep all the SGF, you can bootstrap a bigger network from the data! So, nothing is lost from starting small. You can "upgrade" if the improvements start to plateau. On Fri, Oct 20, 2017, 23:32 Álvaro Begué <alvaro.be...@gmail.com> wrote: > I suggest scaling down the problem until some experience is gained. > > You don't need the full-fledge 40-block network to get started. You can > probably get away with using only 20 blocks and maybe 128 features (from > 256). That should save you about a factor of 8, plus you can use larger > mini-batches. > > You can also start with 9x9 go. That way games are shorter, and you > probably don't need 1600 network evaluations per move to do well. > > Álvaro. > > > On Fri, Oct 20, 2017 at 1:44 PM, Gian-Carlo Pascutto <g...@sjeng.org> > wrote: > >> I reconstructed the full AlphaGo Zero network in Caffe: >> https://sjeng.org/dl/zero.prototxt >> >> I did some performance measurements, with what should be >> state-of-the-art on consumer hardware: >> >> GTX 1080 Ti >> NVIDIA-Caffe + CUDA 9 + cuDNN 7 >> batch size = 8 >> >> Memory use is about ~2G. (It's much more for learning, the original >> minibatch size of 32 wouldn't fit on this card!) >> >> Running 2000 iterations takes 93 seconds. >> >> In the AlphaGo paper, they claim 0.4 seconds to do 1600 MCTS >> simulations, and they expand 1 node per visit (if I got it right) so >> that would be 1600 network evaluations as well, or 200 of my iterations. >> >> So it would take me ~9.3s to produce a self-play move, compared to 0.4s >> for them. >> >> I would like to extrapolate how long it will take to reproduce the >> research, but I think I'm missing how many GPUs are in each self-play >> worker (4 TPU or 64 GPU or ?), or perhaps the average length of the games. >> >> Let's say the latter is around 200 moves. They generated 29 million >> games for the final result, which means it's going to take me about 1700 >> years to replicate this. I initially estimated 7 years based on the >> reported 64 GPU vs 1 GPU, but this seems far worse. Did I miss anything >> in the calculations above, or was it really a *pile* of those 64 GPU >> machines? >> >> Because the performance on playing seems reasonable (you would be able >> to actually run the MCTS on a consumer machine, and hence end up with a >> strong program), I would be interested in setting up a distributed >> effort for this. But realistically there will be maybe 10 people >> joining, 80 if we're very lucky (looking at Stockfish numbers). That >> means it'd still take 20 to 170 years. >> >> Someone please tell me I missed a factor of 100 or more somewhere. I'd >> love to be wrong here. >> > >> -- >> GCP > > >> _______________________________________________ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go -- GCP
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go