Re: [Computer-go] mini-max with Policy and Value network

Hideki Kato Tue, 23 May 2017 10:09:38 -0700

Gian-Carlo Pascutto: <[email protected]>:


>Now, even the original AlphaGo played moves that surprised human pros
>and were contrary to established sequences. So where did those come
>from? Enough computation power to overcome the low probability?
>Synthesized by inference from the (much larger than mine) policy network?

Demis Hassabis said in a talk:
After the game with Sedol, the team used "adversarial learning" in 
order to fill the holes in policy net (such as the Sedol's winning 
move in the game 4).

Hideki

-- 
Hideki Kato <mailto:[email protected]>
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

Reply via email to