On 23-05-17 17:19, Hideki Kato wrote: > Gian-Carlo Pascutto: <0357614a-98b8-6949-723e-e1a849c75...@sjeng.org>: > >> Now, even the original AlphaGo played moves that surprised human pros >> and were contrary to established sequences. So where did those come >> from? Enough computation power to overcome the low probability? >> Synthesized by inference from the (much larger than mine) policy network? > > Demis Hassabis said in a talk: > After the game with Sedol, the team used "adversarial learning" in > order to fill the holes in policy net (such as the Sedol's winning > move in the game 4).
I said, the "original AlphaGo", i.e. the one used in the match against Lee Sedol. According to the Nature paper, the policy net was trained with supervised learning only [1]. And yet... In the attached SGF, AlphaGo played P10, which was considered a very surprising move by all commentators. Presumably, this means it's not seen in high level human play, and would not get a high rating in the policy net. I can sort-of confirm this: 0.295057654 (E13) ...(60 more moves follow)... 0.000011952 (P10) So, 0.001% probability. Demis commented that Lee Sedol's winning move in game 4 was a one in 10 000 move. This is a 1 in 100 000 move. Differently trained policy nets might rate it a bit higher or lower, but simply due to the fact that was considered very un-human to do, it seems unlikely to ever be rated highly by a policy net based on supervised learning. So in AlphaGo's formula, you're dealing with a reduction of the UCT term by a factor 100 000 plus or minus some order of magnitude. D6 -> 1359934 (W: 53.21%) (U: 49.34%) (V: 55.15%: 38918) (N: 6.3%) PV: D6 F6 E7 F7 C8 B8 D7 B7 E9 C9 F8 H7 H 9 K7 H3 K9 ...many moves... P10 -> 421 (W: 52.68%) (U: 50.09%) (V: 53.98%: 8) (N: 0.0%) PV: P10 Q10 P8 Q9 Now, of course AlphaGo had a few orders of magnitude more hardware, but you can see from the above that it's, eh, not easy for P10 to overtake the top moves here in playout count. And yet, that's the move that was played. [1] I'm assuming that what played the match corresponds to what they published there - maybe that is my mistake. I'm not sure I remember the relevant timeline correctly. -- GCP
sedol.sgf
Description: application/go-sgf
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go