[Computer-go] mini-max with Policy and Value network

Hiroshi Yamashita Sat, 20 May 2017 13:03:07 -0700

Hi,

HiraBot author reported mini-max search with Policy and Value network.It does not use monte-carlo.

Only top 8 moves from Policy is searched in root node. In other depth,
top 4 moves is searched.


Game result against Policy network best move (without search)

Win Loss winrateMaxDepth=1, (558-442) 0.558 +40 Elo

MaxDepth=2, (351-150) 0.701  +148 Elo
MaxDepth=3, (406-116) 0.778  +218 Elo
MaxDepth=4, (670- 78) 0.896  +374 Elo
MaxDepth=5, (490- 57) 0.896  +374 Elo
MaxDepth=6, (520- 20) 0.963  +556 Elo

Search is simple alpha-beta.
There is a modification Policy network high probability moves tend to be 
selected.
MaxDepth=6 takes one second/move on i7-4790k + GTX1060.

His nega-max code
http://kiyoshifk.dip.jp/kiyoshifk/apk/negamax.zip
CGOS result, MaxDepth=6
http://www.yss-aya.com/cgos/19x19/cross/minimax-depth6.html
His Policy network(without search) is maybe
http://www.yss-aya.com/cgos/19x19/cross/DCNN-No336-tygem.html
His Policy and Value network(MCTS) is maybe
http://www.yss-aya.com/cgos/19x19/cross/Hiratuka10_38B100.html

Thanks,
Hiroshi Yamashita

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] mini-max with Policy and Value network

Reply via email to