On Fri, Mar 11, 2016 at 09:33:52AM +0100, Robert Jasiek wrote:
> On 11.03.2016 08:24, Huazuo Gao wrote:
> >Points at the center of the board indeed depends on the full board, but
> >points near the edge does not.
>
> I have been wondering why AlphaGo could improve a lot between the Fan Hui
> and Lee Sedol matches incl. learning sente and showing greater signs of more
> global, more long-term planning. A rumour so far suggests to have used the
> time for more learning, but I'd be surprised if this should have sufficed.
My personal hypothesis so far is that it might - the REINFORCE might
scale amazingly well and just continuous application of it (or possibly
more frequent sampling to get more data points; once per game always
seemed quite conservative to me) could make AlphaGo amazingly strong.
We know that after 30mil. self-play games, the RL value network bumps
the strength by ~450 Elo, but what about after 300mil. self-play games?
(Possibly after training the RL policy further too.)
(My main clue for this was the comment that current AlphaGo self-play
games are already looking quite different from human games. Another
explanation for that might be that they found a way to replace the SL
policy with RL policy in the tree.)
--
Petr Baudis
If you have good ideas, good data and fast computers,
you can do almost anything. -- Geoffrey Hinton
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go