Re: [Computer-go] Possible idea - decay old simulations?
>I haven't tried it, but (with the computer chess hat on) these kind of >proposals behave pretty badly when you get into situations where your >evaluation is off and there are horizon effects. In computer Go, this issue focuses on cases where the initial move ordering is bad. It isn't so much evaluation/horizon. For example, at some node moves A, B, C, and D are seen as the best move, but they all fail. So perhaps you invest 2000 moves in each, with a 40% win rate overall. Now you come to move E, which is best. In fact, E is a refutation, with a 70% win rate. The question is: how long will it take before this node is considered to be refuted? The problem is that we started the node badly. When E is first explored, the node has 3200 wins, 4800 losses. Even with a 70% win rate, E needs 4000 trials (= 2800 wins, 1200 losses) just to get back to a 50% win rate. But if you had explored E first, then this node would have been considered refuted after maybe a few hundred trials. There is literally a factor of 100 difference in search efficiency. This situation arises seems odd, but arises frequently. For example, if E is a winning move now, but becomes a losing move if played in the future. (For example, E is in a capturing race that is currently winning for the side to move first.) Unless E is tried early in the node's lifecycle, the RAVE score for E is very low. >> I recall a paper published on this basis. A paper presumably about >> CrazyStone: Efficient Selectivity and Backup Operators in Monte-Carlo >> Tree Search. >I'm reasonably sure this did not include forgetting/discounting, only shifting >between average and maximum by focusing simulations near the maximum. Yes, but emphasizing the maximum is equivalent to forgetting. E.g., in my example, giving E a larger weight is equivalent to giving A, B, C, and D a lower weight (== forgetting). ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Possible idea - decay old simulations?
The tree built by MCTS is very unbalanced - some branches are explored more thoroughly than others. Tweaking the algorithm to favor the newer results might result in an overall improvement, but it also would be subject to all the pitfalls of partially or unevenly evaluated trees. This is especially true if the creation of new nodes is stopped due to memory limitations. ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Possible idea - decay old simulations?
The tree built by MCTS is very unbalanced - some branches are explored more thoroughly than others. Tweaking the algorithm to favor the newer results might result in an overall improvement, but it also would be subject to all the pitfalls of partially or unevenly evaluated trees. This is especially true if the creation of new nodes is stopped due to memory limitations. ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Possible idea - decay old simulations?
Cool, thanks. On Mon, Jul 24, 2017 at 10:30 AM, Gian-Carlo Pascuttowrote: > On 24-07-17 16:07, David Wu wrote: > > Hmm. Why would discounting make things worse? Do you mean that you > > want the top move to drop off slower (i.e. for the bot to take longer > > to achieve the correct valuation of the top move) to give it "time" > > to search the other moves enough to find that they're also bad? > > I don't want the top move to drop off slower, I just don't want to play > other moves until they've been searched to comparable "depth". > > If there's a disaster lurking behind the main-variation that we only > just started to understand, the odds are, the same disaster also lurks > in a few of the alternative moves. > > > I would have thought that with typical exploration policies, whether > > the top move drops off a little faster or a little slower, once its > > winrate drops down close to the other moves, the other moves should > > get a lot of simulations as well. > > Yes. But the goal of the discounting is, that a new move can make it > above the old one, despite having had less total search effort. > > My point is that it is not always clear this is a positive effect. > > > I know that there are ways to handle this at the root, via time > > control or otherwise. > > The situation isn't necessarily different here, if you consider that at > the root the best published technique is still "think longer so the new > move can overtake the old one", not "play the new move". > > Anyway, not saying this can't work. Just pointing out the problem areas. > > I would be a bit surprised if discounting worked for Go because it's > been published for other areas (e.g. Amazons) but I don't remember any > reports of success in Go. But the devil can be in the details (i.e. the > discounting formula) for tricks like this. > > -- > GCP > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Possible idea - decay old simulations?
On 24-07-17 16:07, David Wu wrote: > Hmm. Why would discounting make things worse? Do you mean that you > want the top move to drop off slower (i.e. for the bot to take longer > to achieve the correct valuation of the top move) to give it "time" > to search the other moves enough to find that they're also bad? I don't want the top move to drop off slower, I just don't want to play other moves until they've been searched to comparable "depth". If there's a disaster lurking behind the main-variation that we only just started to understand, the odds are, the same disaster also lurks in a few of the alternative moves. > I would have thought that with typical exploration policies, whether > the top move drops off a little faster or a little slower, once its > winrate drops down close to the other moves, the other moves should > get a lot of simulations as well. Yes. But the goal of the discounting is, that a new move can make it above the old one, despite having had less total search effort. My point is that it is not always clear this is a positive effect. > I know that there are ways to handle this at the root, via time > control or otherwise. The situation isn't necessarily different here, if you consider that at the root the best published technique is still "think longer so the new move can overtake the old one", not "play the new move". Anyway, not saying this can't work. Just pointing out the problem areas. I would be a bit surprised if discounting worked for Go because it's been published for other areas (e.g. Amazons) but I don't remember any reports of success in Go. But the devil can be in the details (i.e. the discounting formula) for tricks like this. -- GCP ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Possible idea - decay old simulations?
Thanks for the replies! On Mon, Jul 24, 2017 at 9:30 AM, Gian-Carlo Pascuttowrote: > On 23-07-17 18:24, David Wu wrote: > > Has anyone tried this sort of idea before? > > I haven't tried it, but (with the computer chess hat on) these kind of > proposals behave pretty badly when you get into situations where your > evaluation is off and there are horizon effects. The top move drops off > and now every alternative that has had less search looks better (because > it hasn't seen the disaster yet). You do not want discounting in this > situation. > > Hmm. Why would discounting make things worse? Do you mean that you want the top move to drop off slower (i.e. for the bot to take longer to achieve the correct valuation of the top move) to give it "time" to search the other moves enough to find that they're also bad? I would have thought that with typical exploration policies, whether the top move drops off a little faster or a little slower, once its winrate drops down close to the other moves, the other moves should get a lot of simulations as well. It's true that a move with a superior winrate than the move with the > maximum amount of simulations is a good candidate to be better. Some > engines will extend the time when this happens. Leela will play it, in > certain conditions. > > I know that there are ways to handle this at the root, via time control or otherwise. The case I described here is when this happens not at the root, but deeper in the tree. At the root, move B still looks much worse than A, it's merely that within B's subtree there's a newly found tactic C that is much better than A. From the root, B still looks worse than A, its winrate has recently started to rise slowly but extremely steadily (with very high statistical confidence, such that one might project it to eventually overtake A). ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Possible idea - decay old simulations?
Yes. This is a long-known phenomenon. I was able to get improvements in Pebbles based on the idea of forgetting unsuccessful results. It has to be done somewhat carefully, because results propagate up the tree. But you can definitely make it work. I recall a paper published on this basis. A paper presumably about CrazyStone: Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. I see a paper called “Accelerated UCT and Its Application to Two-Player Games”. Could be others. From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of David Wu Sent: Sunday, July 23, 2017 12:25 PM To: computer-go@computer-go.org Subject: [Computer-go] Possible idea - decay old simulations? I've been using Leela 0.10.0 for analysis quite often, and I've noticed something that might lead to an improvement for the search, and maybe also for other MCTS programs. Sometimes, after putting hundreds of thousands of simulations into a few possible moves, Leela will appear to favor one, disliking the others for having clearly worse reported winrates. But then every once in a while, the winrate for one of the other disliked moves will start rising gradually, but very consistently. When this happens, if I play down the variation for that move and look at the analysis window, I often find that Leela has discovered a new tactic. Specifically, I find a node in that subtree where one move has a greatly higher winrate than all the others, but does not have too many simulations yet, meaning Leela only just now found it. (Possibly it already has more simulations than any other single move, but the number of simulations of all of the other moves combined still significantly outweighs it). Going back to the root, it's clear that if the new tactic has a high enough winrate, then the previously disliked move will eventually overtake the favored move. But it takes a long time, since the disliked move has a lot of bad simulations to outweigh - it's painful to watch the winrate creep up slowly but with high upward consistency, until it finally beats out the previously favored move. I think there's a chance that the search could be improved by adding a small decay over time to the weight of old simulations. This would allow a move to be promoted a bit more rapidly with the discovery of a better tactic. You would probably want to set the decay over time so that the total weight over time still readily approaches infinity (e.g. a fixed exponential decay would probably be bad, that would bound the total weight by a constant), but perhaps a bit slower than linearly. Thinking about it from the multi-armed-bandit perspective, I think this also makes sense. The distribution of results from each child is nonstationary, because the subtree below the child is evolving over time. If they were stationary you would weight all historical simulations equally, but since they aren't, the more-recent results from a child should get a little bit more weight since they give you more information about the current performance of the child move. Has anyone tried this sort of idea before? ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
[Computer-go] Possible idea - decay old simulations?
I've been using Leela 0.10.0 for analysis quite often, and I've noticed something that might lead to an improvement for the search, and maybe also for other MCTS programs. Sometimes, after putting hundreds of thousands of simulations into a few possible moves, Leela will appear to favor one, disliking the others for having clearly worse reported winrates. But then every once in a while, the winrate for one of the other disliked moves will start rising gradually, but very consistently. When this happens, if I play down the variation for that move and look at the analysis window, I often find that Leela has discovered a new tactic. Specifically, I find a node in that subtree where one move has a greatly higher winrate than all the others, but does not have too many simulations yet, meaning Leela only just now found it. (Possibly it already has more simulations than any other single move, but the number of simulations of all of the other moves combined still significantly outweighs it). Going back to the root, it's clear that if the new tactic has a high enough winrate, then the previously disliked move will eventually overtake the favored move. But it takes a long time, since the disliked move has a lot of bad simulations to outweigh - it's painful to watch the winrate creep up slowly but with high upward consistency, until it finally beats out the previously favored move. I think there's a chance that the search could be improved by adding a small decay over time to the weight of old simulations. This would allow a move to be promoted a bit more rapidly with the discovery of a better tactic. You would probably want to set the decay over time so that the total weight over time still readily approaches infinity (e.g. a fixed exponential decay would probably be bad, that would bound the total weight by a constant), but perhaps a bit slower than linearly. Thinking about it from the multi-armed-bandit perspective, I think this also makes sense. The distribution of results from each child is nonstationary, because the subtree below the child is evolving over time. If they were stationary you would weight all historical simulations equally, but since they aren't, the more-recent results from a child should get a little bit more weight since they give you more information about the current performance of the child move. Has anyone tried this sort of idea before? ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go