Re: [Computer-go] Zero performance
On 20/10/2017 22:48, fotl...@smart-games.com wrote: > The paper describes 20 and 40 block networks, but the section on > comparison says AlphaGo Zero uses 20 blocks. I think your protobuf > describes a 40 block network. That's a factor of two They compared with both, the final 5180 Elo number is for the 40 block one. For the 20 block one, the numbers stop around 4300 Elo. See for example: https://www.reddit.com/r/baduk/comments/77hr3b/elo_table_of_alphago_zero_selfplay_games/ A factor of 2 isn't much, but sure, it seems sensible to start with the smaller one, given how intractable the problem looks right now. > Your time looks reasonable when calculating the time to generate the > 29M games at about 10 seconds per move. This is only the time to > generate the input data. Do you have an estimate of the additional > time it takes to do the training? It's probably small in comparison, > but it might not be. So far I've assumed that it's zero, because it can happen in parallel and the time to generate the self-play games dominates. From the revised hardware estimates, we can also see that the training machines used 64 GPUs, which is a lot smaller than the 1500+ TPU estimate for the self-play machines. Training on the GTX 1080 Ti does 4 batches of 32 positions per second. They use 2048 position batches, and train for 1000 batches before checkpointing. So the GTX can produce a checkpoint every 4.5 hours [1]. Testing that over 400 games takes 8.5 days (400 x 200 x 9.3s). So again, it totally bottlenecks on playing games, not on training. At least, if the improvement is big, one needn't play the 400 games out, but SPRT termination can be used. [1] To be honest, this seems very fast - even starting from 0 such a big network barely advances in 1000 iterations (or I misinterpreted a training parameter). But I guess it's important to have a very fast - learn knowledge - use new knowledge - feedback cycle. -- GCP ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Zero performance
On 20/10/2017 22:41, Sorin Gherman wrote: > Training of AlphaGo Zero has been done on thousands of TPUs, > according to this source: > https://www.reddit.com/r/baduk/comments/777ym4/alphago_zero_learning_from_scratch_deepmind/dokj1uz/?context=3 > > Maybe that should explain the difference in orders of magnitude that > you noticed? That would make a lot more sense, for sure. It would also explain the 25M USD number from Hassabis. That would be a lot of money to spend on "only" 64 GPUs, or 4 TPU (which are supposed to be ~1 GPU). There's no explanation where the number came from, but it seems that he did similar math as in the original post here. -- GCP ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Zero performance
I agree. Even on 19x19 you can use smaller searches. 400 iterations MCTS is probably already a lot stronger than the raw network, especially if you are expanding every node (very different from a normal program at 400 playouts!). Some tuning of these mini searches is important. Surely you don't want to explore every child node for the first play urgency... I remember this little algorithmic detail was missing from the first paper as well. So that's a factor 32 gain. Because the network is smaller, it should learn much faster too. Someone on reddit posted a comparison of 20 blocks vs 40 blocks. With 10 people you can probably get some results in a few months. The question is, how much Elo have we lost on the way... Another advantage would be that, as long as you keep all the SGF, you can bootstrap a bigger network from the data! So, nothing is lost from starting small. You can "upgrade" if the improvements start to plateau. On Fri, Oct 20, 2017, 23:32 Álvaro Beguéwrote: > I suggest scaling down the problem until some experience is gained. > > You don't need the full-fledge 40-block network to get started. You can > probably get away with using only 20 blocks and maybe 128 features (from > 256). That should save you about a factor of 8, plus you can use larger > mini-batches. > > You can also start with 9x9 go. That way games are shorter, and you > probably don't need 1600 network evaluations per move to do well. > > Álvaro. > > > On Fri, Oct 20, 2017 at 1:44 PM, Gian-Carlo Pascutto > wrote: > >> I reconstructed the full AlphaGo Zero network in Caffe: >> https://sjeng.org/dl/zero.prototxt >> >> I did some performance measurements, with what should be >> state-of-the-art on consumer hardware: >> >> GTX 1080 Ti >> NVIDIA-Caffe + CUDA 9 + cuDNN 7 >> batch size = 8 >> >> Memory use is about ~2G. (It's much more for learning, the original >> minibatch size of 32 wouldn't fit on this card!) >> >> Running 2000 iterations takes 93 seconds. >> >> In the AlphaGo paper, they claim 0.4 seconds to do 1600 MCTS >> simulations, and they expand 1 node per visit (if I got it right) so >> that would be 1600 network evaluations as well, or 200 of my iterations. >> >> So it would take me ~9.3s to produce a self-play move, compared to 0.4s >> for them. >> >> I would like to extrapolate how long it will take to reproduce the >> research, but I think I'm missing how many GPUs are in each self-play >> worker (4 TPU or 64 GPU or ?), or perhaps the average length of the games. >> >> Let's say the latter is around 200 moves. They generated 29 million >> games for the final result, which means it's going to take me about 1700 >> years to replicate this. I initially estimated 7 years based on the >> reported 64 GPU vs 1 GPU, but this seems far worse. Did I miss anything >> in the calculations above, or was it really a *pile* of those 64 GPU >> machines? >> >> Because the performance on playing seems reasonable (you would be able >> to actually run the MCTS on a consumer machine, and hence end up with a >> strong program), I would be interested in setting up a distributed >> effort for this. But realistically there will be maybe 10 people >> joining, 80 if we're very lucky (looking at Stockfish numbers). That >> means it'd still take 20 to 170 years. >> >> Someone please tell me I missed a factor of 100 or more somewhere. I'd >> love to be wrong here. >> > >> -- >> GCP > > >> ___ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go -- GCP ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Zero performance
> You can also start with 9x9 go. That way games are shorter, and you probably > don't need 1600 network evaluations per move to do well. Bonus points if you can have it play on goquest where many of us can enjoy watching its progress, or even challenge it... regards, -John ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Zero performance
The paper describes 20 and 40 block networks, but the section on comparison says AlphaGo Zero uses 20 blocks. I think your protobuf describes a 40 block network. That's a factor of two If you only want pro strength rather than superhuman, you can train for half their time. Your time looks reasonable when calculating the time to generate the 29M games at about 10 seconds per move. This is only the time to generate the input data. Do you have an estimate of the additional time it takes to do the training? It's probably small in comparison, but it might not be. My plan is to start out with a little supervised learning, since I'm not trying to prove a breakthrough. I experimented last year for a few months with res-nets for a policy network and there are some things I discovered there that probably apply to this network. They should get perhaps a factor of 5 to 10 speedup. For a commercial program I'll be happy with 7-dan amateur with about 6 months of training using my two GPUs and sixteen i7 cores. David -Original Message- From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Gian-Carlo Pascutto Sent: Friday, October 20, 2017 10:45 AM To: computer-go@computer-go.org Subject: [Computer-go] Zero performance I reconstructed the full AlphaGo Zero network in Caffe: https://sjeng.org/dl/zero.prototxt I did some performance measurements, with what should be state-of-the-art on consumer hardware: GTX 1080 Ti NVIDIA-Caffe + CUDA 9 + cuDNN 7 batch size = 8 Memory use is about ~2G. (It's much more for learning, the original minibatch size of 32 wouldn't fit on this card!) Running 2000 iterations takes 93 seconds. In the AlphaGo paper, they claim 0.4 seconds to do 1600 MCTS simulations, and they expand 1 node per visit (if I got it right) so that would be 1600 network evaluations as well, or 200 of my iterations. So it would take me ~9.3s to produce a self-play move, compared to 0.4s for them. I would like to extrapolate how long it will take to reproduce the research, but I think I'm missing how many GPUs are in each self-play worker (4 TPU or 64 GPU or ?), or perhaps the average length of the games. Let's say the latter is around 200 moves. They generated 29 million games for the final result, which means it's going to take me about 1700 years to replicate this. I initially estimated 7 years based on the reported 64 GPU vs 1 GPU, but this seems far worse. Did I miss anything in the calculations above, or was it really a *pile* of those 64 GPU machines? Because the performance on playing seems reasonable (you would be able to actually run the MCTS on a consumer machine, and hence end up with a strong program), I would be interested in setting up a distributed effort for this. But realistically there will be maybe 10 people joining, 80 if we're very lucky (looking at Stockfish numbers). That means it'd still take 20 to 170 years. Someone please tell me I missed a factor of 100 or more somewhere. I'd love to be wrong here. -- GCP ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Zero performance
Training of AlphaGo Zero has been done on thousands of TPUs, according to this source: https://www.reddit.com/r/baduk/comments/777ym4/alphago_zero_learning_from_scratch_deepmind/dokj1uz/?context=3 Maybe that should explain the difference in orders of magnitude that you noticed? On Fri, Oct 20, 2017 at 10:44 AM, Gian-Carlo Pascuttowrote: > I reconstructed the full AlphaGo Zero network in Caffe: > https://sjeng.org/dl/zero.prototxt > > I did some performance measurements, with what should be > state-of-the-art on consumer hardware: > > GTX 1080 Ti > NVIDIA-Caffe + CUDA 9 + cuDNN 7 > batch size = 8 > > Memory use is about ~2G. (It's much more for learning, the original > minibatch size of 32 wouldn't fit on this card!) > > Running 2000 iterations takes 93 seconds. > > In the AlphaGo paper, they claim 0.4 seconds to do 1600 MCTS > simulations, and they expand 1 node per visit (if I got it right) so > that would be 1600 network evaluations as well, or 200 of my iterations. > > So it would take me ~9.3s to produce a self-play move, compared to 0.4s > for them. > > I would like to extrapolate how long it will take to reproduce the > research, but I think I'm missing how many GPUs are in each self-play > worker (4 TPU or 64 GPU or ?), or perhaps the average length of the games. > > Let's say the latter is around 200 moves. They generated 29 million > games for the final result, which means it's going to take me about 1700 > years to replicate this. I initially estimated 7 years based on the > reported 64 GPU vs 1 GPU, but this seems far worse. Did I miss anything > in the calculations above, or was it really a *pile* of those 64 GPU > machines? > > Because the performance on playing seems reasonable (you would be able > to actually run the MCTS on a consumer machine, and hence end up with a > strong program), I would be interested in setting up a distributed > effort for this. But realistically there will be maybe 10 people > joining, 80 if we're very lucky (looking at Stockfish numbers). That > means it'd still take 20 to 170 years. > > Someone please tell me I missed a factor of 100 or more somewhere. I'd > love to be wrong here. > > -- > GCP > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Zero performance
I suggest scaling down the problem until some experience is gained. You don't need the full-fledge 40-block network to get started. You can probably get away with using only 20 blocks and maybe 128 features (from 256). That should save you about a factor of 8, plus you can use larger mini-batches. You can also start with 9x9 go. That way games are shorter, and you probably don't need 1600 network evaluations per move to do well. Álvaro. On Fri, Oct 20, 2017 at 1:44 PM, Gian-Carlo Pascuttowrote: > I reconstructed the full AlphaGo Zero network in Caffe: > https://sjeng.org/dl/zero.prototxt > > I did some performance measurements, with what should be > state-of-the-art on consumer hardware: > > GTX 1080 Ti > NVIDIA-Caffe + CUDA 9 + cuDNN 7 > batch size = 8 > > Memory use is about ~2G. (It's much more for learning, the original > minibatch size of 32 wouldn't fit on this card!) > > Running 2000 iterations takes 93 seconds. > > In the AlphaGo paper, they claim 0.4 seconds to do 1600 MCTS > simulations, and they expand 1 node per visit (if I got it right) so > that would be 1600 network evaluations as well, or 200 of my iterations. > > So it would take me ~9.3s to produce a self-play move, compared to 0.4s > for them. > > I would like to extrapolate how long it will take to reproduce the > research, but I think I'm missing how many GPUs are in each self-play > worker (4 TPU or 64 GPU or ?), or perhaps the average length of the games. > > Let's say the latter is around 200 moves. They generated 29 million > games for the final result, which means it's going to take me about 1700 > years to replicate this. I initially estimated 7 years based on the > reported 64 GPU vs 1 GPU, but this seems far worse. Did I miss anything > in the calculations above, or was it really a *pile* of those 64 GPU > machines? > > Because the performance on playing seems reasonable (you would be able > to actually run the MCTS on a consumer machine, and hence end up with a > strong program), I would be interested in setting up a distributed > effort for this. But realistically there will be maybe 10 people > joining, 80 if we're very lucky (looking at Stockfish numbers). That > means it'd still take 20 to 170 years. > > Someone please tell me I missed a factor of 100 or more somewhere. I'd > love to be wrong here. > > -- > GCP > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go