On 22-05-17 21:01, Marc Landgraf wrote: > But what you should really look at here is Leelas evaluation of the game.
Note that this is completely irrelevant for the discussion about tactical holes and the position I posted. You could literally plug any evaluation into it (save for a static oracle, in which case why search at all...) and it would still have the tactical blindness being discussed. It's an issue of limitations of the policy network, combined with the way one uses the UCT formula. I'll use the one from the original AlphaGo paper here, because it's public and should behave even worse: u(s, a) = c_puct * P(s, a) * sqrt(total_visits / (1 + child_visits)) Note that P(s, a) is a direct factor here, which means that for a move ignored by the policy network, the UCT term will almost vanish. In other words, unless the win is immediately visible (and for tactics it won't), you're not going to find it. Also note that this is a deviation from regular UCT or PUCT, which do not have such a direct term and hence only have a disappearing prior, making the search eventually more exploratory. Now, even the original AlphaGo played moves that surprised human pros and were contrary to established sequences. So where did those come from? Enough computation power to overcome the low probability? Synthesized by inference from the (much larger than mine) policy network? -- GCP _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go