Hi Detlef,

You did not try reinforcement learning I think. Do you have any idea,
why this would make the policy network 250ELO stronger, as mentioned
in the alphago paper (80% winrate)?

I have not tried reinforcement learning, but I guess if threre are two moves,
SL probability are
taking 5 stones(35%), good shape(37%).
RL may change this taking 5 stones(80%), good shape(10%). For weaker player, taking 5 stones is maybe safe.

Do you think playing strength would be better, if one only takes into
account the moves of the winning player?

I think learning only from winning player moves will get better result.


Now I'm making 13x13 selfplay games like AlphaGo paper.
1. make a position by Policy(SL) probability from initial position.
2. play a move uniformly at random from available moves.
3. play left moves by Policy(RL) to the end.
(2) means it plays very bad move usually. Maybe it is because making
completely different position? I don't understand why this (2) is needed.

Thanks,
Hiroshi Yamashita

_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to