Re: [Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-21 Thread Gian-Carlo Pascutto
On 17-11-16 22:38, Hiroshi Yamashita wrote:
> Value Net is 32 Filters, 14 Layers.
> 32 5x5 x1, 32 3x3 x11, 32 1x1 x1, fully connect 256, fully connect tanh 1

I think this should be:
32 5x5 x1, 32 3x3 x11, 1 1x1 x1, fully connect 256, fully connect tanh 1

Else one has a 361 * 32 * 256 layer with 3M weights, while all the conv
layer have maybe 100k weights in total. That looks strange.

> Features are 50 channels.
> http://computer-go.org/pipermail/computer-go/2016-March/008768.html

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-21 Thread Detlef Schmicker
You are absolutely right, as I was in understanding RL policy network
mode I thought, everything is about this, sorry

Am 21.11.2016 um 15:22 schrieb Gian-Carlo Pascutto:
> On 20-11-16 11:16, Detlef Schmicker wrote:
>> Hi Hiroshi,
>>
>>> Now I'm making 13x13 selfplay games like AlphaGo paper. 1. make a
>>> position by Policy(SL) probability from initial position. 2. play a
>>> move uniformly at random from available moves. 3. play left moves
>>> by Policy(RL) to the end. (2) means it plays very bad move usually.
>>> Maybe it is because making completely different position? I don't
>>> understand why this (2) is
>> needed.
>>
>> I did not read the alphago paper like this.
>>
>> I read it uses the RL policy the "usual" way (I would say it means
>> something like randomizing with the net probabilities for the best 5
>> moves or so)
>>
>> but randomize the opponent uniformaly, meaning the net values of the
>> opponent are taken from an earlier step in the reinforcement learning.
>>
>> Meaning e.g.
>>
>> step 1 playing against step 7645 in the reinforcement history?
>>
>> Or did I understand you wrong?
> 
> You are confusing the Policy Network RL procedure with the Value Network
> data production.
> 
> For the Value Network indeed the procedure is as described, with one
> move at time U being uniformly sampled from {1,361} until it is legal. I
> think it's because we're not interested (only) in playing good moves,
> but also analyzing as diverse as possible positions to learn whether
> they're won or lost. Throwing in one totally random move vastly
> increases the diversity and the number of odd positions the network
> sees, while still not leading to totally nonsensical positions.
> 
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-21 Thread valkyria
Yes, I think the important thing of the value function is to detect 
moves that are very bad so that MC-eval does not have to sample more 
than once for many variations.


If the evaluation function was trained on pro moves only, it would not 
know what a bad move looks like. At least the evaluation function would 
not be able to see thee difference between "very bad", "never good" and 
"sometimes possible".


Magnus

On 2016-11-21 15:22, Gian-Carlo Pascutto wrote:

For the Value Network indeed the procedure is as described, with one
move at time U being uniformly sampled from {1,361} until it is legal. 
I

think it's because we're not interested (only) in playing good moves,
but also analyzing as diverse as possible positions to learn whether
they're won or lost. Throwing in one totally random move vastly
increases the diversity and the number of odd positions the network
sees, while still not leading to totally nonsensical positions.

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-21 Thread Gian-Carlo Pascutto
On 20-11-16 11:16, Detlef Schmicker wrote:
> Hi Hiroshi,
> 
>> Now I'm making 13x13 selfplay games like AlphaGo paper. 1. make a
>> position by Policy(SL) probability from initial position. 2. play a
>> move uniformly at random from available moves. 3. play left moves
>> by Policy(RL) to the end. (2) means it plays very bad move usually.
>> Maybe it is because making completely different position? I don't
>> understand why this (2) is
> needed.
> 
> I did not read the alphago paper like this.
> 
> I read it uses the RL policy the "usual" way (I would say it means
> something like randomizing with the net probabilities for the best 5
> moves or so)
> 
> but randomize the opponent uniformaly, meaning the net values of the
> opponent are taken from an earlier step in the reinforcement learning.
> 
> Meaning e.g.
> 
> step 1 playing against step 7645 in the reinforcement history?
> 
> Or did I understand you wrong?

You are confusing the Policy Network RL procedure with the Value Network
data production.

For the Value Network indeed the procedure is as described, with one
move at time U being uniformly sampled from {1,361} until it is legal. I
think it's because we're not interested (only) in playing good moves,
but also analyzing as diverse as possible positions to learn whether
they're won or lost. Throwing in one totally random move vastly
increases the diversity and the number of odd positions the network
sees, while still not leading to totally nonsensical positions.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-20 Thread Detlef Schmicker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Hiroshi,

> Now I'm making 13x13 selfplay games like AlphaGo paper. 1. make a
> position by Policy(SL) probability from initial position. 2. play a
> move uniformly at random from available moves. 3. play left moves
> by Policy(RL) to the end. (2) means it plays very bad move usually.
> Maybe it is because making completely different position? I don't
> understand why this (2) is
needed.

I did not read the alphago paper like this.

I read it uses the RL policy the "usual" way (I would say it means
something like randomizing with the net probabilities for the best 5
moves or so)

but randomize the opponent uniformaly, meaning the net values of the
opponent are taken from an earlier step in the reinforcement learning.

Meaning e.g.

step 1 playing against step 7645 in the reinforcement history?

Or did I understand you wrong?


Detlef
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJYMXf5AAoJEInWdHg+Znf4MI4QAJw3EAKpjbkQKtrO/3gFDazy
ASbIAChXNQEfmqOyq40d/PerlUUat+xkMJInlmnE+qwkrpM1ityTKT6Q8Yee1TWW
HmjRj4CQ4qxXWEGwWdIY4n2P36cz3x6xiItM9v7MJ0/p/WXJJyhH0MgmXpJuFJN5
rMxqol6b0ilr29UL5nY4L8pMsBI9dtOI0+DYg/eNKtg9lOfJEYfByGP7BENQV0GD
sqMKmMfHgnQ7swZhIm4nLB4R78m4GJUEFsvTHMm8rOyFJoulwRvaBYRfdtu3x4kF
kigJ3VmfAcowVUER7fDjL4/KzWcVlUGEw0gBTIK+xIheqIglLIHFLToM+FDwM3T8
poOI+f2tXWcPu1V0r85rpVFJ6nBrPey0pai0GcEL6I5N+ooG7fb5XorX7TAeOjhH
ASuzlUO2lBpdSjcpVX+9l5nniXfdM6zNE6XPZW6JEOmIHEkjo7kaREd91I+GRhKW
l6cRuVhAiudix3j31+PS7qIUdeKodipTR5qqfdxeYljiSQwJgw5tbucD6Db8/HJh
Jg9PQvdfAnTAj93jWxE/dSsbB7GOy1vThJiQcSNP1PNcw4l62hwsSZC9MCkEzOFk
Sqb8D/8eMHoiTwZMwjZN+GdDy9XoFFGwVWG1HEHcgZO3hhN8ntR2D71Y8grbKFKu
LpuegNW6/ChRCRAo73k/
=RR28
-END PGP SIGNATURE-
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-19 Thread Hiroshi Yamashita

Hi Detlef,


You did not try reinforcement learning I think. Do you have any idea,
why this would make the policy network 250ELO stronger, as mentioned
in the alphago paper (80% winrate)?


I have not tried reinforcement learning, but I guess if threre are two moves,
SL probability are
taking 5 stones(35%), good shape(37%).
RL may change this 
taking 5 stones(80%), good shape(10%). 
For weaker player, taking 5 stones is maybe safe.



Do you think playing strength would be better, if one only takes into
account the moves of the winning player?


I think learning only from winning player moves will get better result.


Now I'm making 13x13 selfplay games like AlphaGo paper.
1. make a position by Policy(SL) probability from initial position.
2. play a move uniformly at random from available moves.
3. play left moves by Policy(RL) to the end.
(2) means it plays very bad move usually. Maybe it is because making
completely different position? I don't understand why this (2) is needed.

Thanks,
Hiroshi Yamashita

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-19 Thread Roel van Engelen
Hi Detlef

My bot is not pro jet but i build gosu games
 (
similar to waltheri.net  ) and i found certain
"odd" positions
occurring in 200+ games where over 80% of the pro's chooses move A while
90% of the games picking move A is
lost by that player.

suggesting pro players in certain positions choose a "sub optimal" move

to me it seems that the influence of these "sub optimal" moves is
diminished by using reinforcement learning for a
limited time, unfortunately my implementation is not ready enough to verify
this..

Roel

On 19 November 2016 at 09:07, Detlef Schmicker  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hi Hiroshi,
>
> thanks a lot for your info.
>
> You did not try reinforcement learning I think. Do you have any idea,
> why this would make the policy network 250ELO stronger, as mentioned
> in the alphago paper (80% winrate)?
>
> Are pros playing so bad?
>
> Do you think playing strength would be better, if one only takes into
> account the moves of the winning player?
>
> Detlef
>
> Am 19.11.2016 um 05:18 schrieb Hiroshi Yamashita:
> > Hi,
> >
> >> Did you not find a benefit from a larger value network? Too
> >> little data and too much overfitting? Or more benefit from more
> >> frequent evaluation?
> >
> > I did not find larger value network is better. But I think I need
> > more taraining data and stronger selfplay. I did not find
> > overfitting so far, and did not try more frequent evaluation.
> >
> >>> Policy + Value vs Policy, 1000 playouts/move, 1000 games. 9x9,
> >>> komi 7.0 0.634  using game result. 0 or 1
> >>
> >> I presume this is a winrate, but over what base? Policy network?
> >
> > Yes. Policy network(only root node) + value network  vs  Policy
> > network(only root node).
> >
> >> How do you handle handicap games? I see you excluded them from
> >> the KGS dataset. Can your value network deal with handicap?
> >
> > I excluded hadicap games. My value network can not handle hadicaps.
> > It it only for komi 7.5.
> >
> > Thanks, Hiroshi Yamashita
> >
> > ___ Computer-go mailing
> > list Computer-go@computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
> >
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.22 (GNU/Linux)
>
> iQIcBAEBAgAGBQJYMAhCAAoJEInWdHg+Znf4M+4P/RcgEbK7TpyPOf3BKdEEaw1u
> hGkCFYRDhTKHyqCDtlCTKAyoi8sUl0fCMCNOvzV17Cg46uZwNgS3PDqkPFVDuD7I
> GBZQgNDXmc9+80Vn0KdDbbBAwGhsH0emzKLndwcN9oshk6cylpIiwB73JC7kvijY
> uZb9iA+nOQNBbAvDDNxJNiTVz0qe3XPYSIZOaYa/HTwdnG3aFAkiC8bom3vs8Bn4
> h45NkY5YkcScQug4hWP7g9IWa3wEdbVPVKtE/B1SxcjOm5aksuOkJvoFFJwEsId1
> tifcT81JzThGJt1TgFpotgbA8QgDRGc6z3BXNggw5AuIU32zonqbljHiynG6Uz7I
> djxywrngr9Xif8KYlteSYVViA9cJZRwbE+nHFT1Fn8lc3BDk2lypG++IaMq0QwWM
> UmEn8U9TKhD4um8HcFSJGvrqUZBnsO8bcp9rUTFssqFm5ZGsoY0nwRt8EezKZ/Sh
> jZqbqplmYDIBoZ6f/VwQfe3OtPLSzmDtCYpx7lh4eXBTLQ74gr8NxksyE9JGXHk4
> tQ5bfRq4gobCkFuwHf2ypIhw8TNRvzq9QI4B3Hin7XcR6KKE27zqh3pNChH9VnXN
> jv5Elre4y71HCYlc5pZdeu6WK8RS+ju3nwsWJhfgZGsu5J0apFlt5XSzW2UnL+I5
> 0p6AUG2zTq7iuxuAZlaO
> =jpkW
> -END PGP SIGNATURE-
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-19 Thread Detlef Schmicker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Hiroshi,

thanks a lot for your info.

You did not try reinforcement learning I think. Do you have any idea,
why this would make the policy network 250ELO stronger, as mentioned
in the alphago paper (80% winrate)?

Are pros playing so bad?

Do you think playing strength would be better, if one only takes into
account the moves of the winning player?

Detlef

Am 19.11.2016 um 05:18 schrieb Hiroshi Yamashita:
> Hi,
> 
>> Did you not find a benefit from a larger value network? Too
>> little data and too much overfitting? Or more benefit from more
>> frequent evaluation?
> 
> I did not find larger value network is better. But I think I need
> more taraining data and stronger selfplay. I did not find
> overfitting so far, and did not try more frequent evaluation.
> 
>>> Policy + Value vs Policy, 1000 playouts/move, 1000 games. 9x9,
>>> komi 7.0 0.634  using game result. 0 or 1
>> 
>> I presume this is a winrate, but over what base? Policy network?
> 
> Yes. Policy network(only root node) + value network  vs  Policy
> network(only root node).
> 
>> How do you handle handicap games? I see you excluded them from
>> the KGS dataset. Can your value network deal with handicap?
> 
> I excluded hadicap games. My value network can not handle hadicaps.
> It it only for komi 7.5.
> 
> Thanks, Hiroshi Yamashita
> 
> ___ Computer-go mailing
> list Computer-go@computer-go.org 
> http://computer-go.org/mailman/listinfo/computer-go
> 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJYMAhCAAoJEInWdHg+Znf4M+4P/RcgEbK7TpyPOf3BKdEEaw1u
hGkCFYRDhTKHyqCDtlCTKAyoi8sUl0fCMCNOvzV17Cg46uZwNgS3PDqkPFVDuD7I
GBZQgNDXmc9+80Vn0KdDbbBAwGhsH0emzKLndwcN9oshk6cylpIiwB73JC7kvijY
uZb9iA+nOQNBbAvDDNxJNiTVz0qe3XPYSIZOaYa/HTwdnG3aFAkiC8bom3vs8Bn4
h45NkY5YkcScQug4hWP7g9IWa3wEdbVPVKtE/B1SxcjOm5aksuOkJvoFFJwEsId1
tifcT81JzThGJt1TgFpotgbA8QgDRGc6z3BXNggw5AuIU32zonqbljHiynG6Uz7I
djxywrngr9Xif8KYlteSYVViA9cJZRwbE+nHFT1Fn8lc3BDk2lypG++IaMq0QwWM
UmEn8U9TKhD4um8HcFSJGvrqUZBnsO8bcp9rUTFssqFm5ZGsoY0nwRt8EezKZ/Sh
jZqbqplmYDIBoZ6f/VwQfe3OtPLSzmDtCYpx7lh4eXBTLQ74gr8NxksyE9JGXHk4
tQ5bfRq4gobCkFuwHf2ypIhw8TNRvzq9QI4B3Hin7XcR6KKE27zqh3pNChH9VnXN
jv5Elre4y71HCYlc5pZdeu6WK8RS+ju3nwsWJhfgZGsu5J0apFlt5XSzW2UnL+I5
0p6AUG2zTq7iuxuAZlaO
=jpkW
-END PGP SIGNATURE-
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-18 Thread Hiroshi Yamashita

Hi,


Did you not find a benefit from a larger value network? Too little data
and too much overfitting? Or more benefit from more frequent evaluation?


I did not find larger value network is better.
But I think I need more taraining data and stronger selfplay.
I did not find overfitting so far, and did not try more frequent evaluation.


Policy + Value vs Policy, 1000 playouts/move, 1000 games. 9x9, komi 7.0
0.634  using game result. 0 or 1


I presume this is a winrate, but over what base? Policy network?


Yes.
Policy network(only root node) + value network  vs  Policy network(only root 
node).


How do you handle handicap games? I see you excluded them from the KGS
dataset. Can your value network deal with handicap?


I excluded hadicap games.
My value network can not handle hadicaps. It it only for komi 7.5.

Thanks,
Hiroshi Yamashita

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-18 Thread Gian-Carlo Pascutto
On 17/11/2016 22:38, Hiroshi Yamashita wrote:
> Features are 49 channels.
> http://computer-go.org/pipermail/computer-go/2016-February/008606.html
...
> Value Net is 32 Filters, 14 Layers.
> 32 5x5 x1, 32 3x3 x11, 32 1x1 x1, fully connect 256, fully connect tanh 1
> Features are 50 channels.
> http://computer-go.org/pipermail/computer-go/2016-March/008768.htm

Thank you for this information. It takes a long time to train the
networks, so knowing which experiments have not worked is very valuable.

Did you not find a benefit from a larger value network? Too little data
and too much overfitting? Or more benefit from more frequent evaluation?

> Policy + Value vs Policy, 1000 playouts/move, 1000 games. 9x9, komi 7.0
> 0.634  using game result. 0 or 1

I presume this is a winrate, but over what base? Policy network?

> I also made 19x19 Value net. 19x19 learning positions are from KGS 4d over,
> GoGoD, Tygem and 500 playouts/move selfplay. 990255 games. 32 positions
> are selected from a game. Like Detlef's idea, I also use game result.
> I trust B+R and W+R games with komi 5.5, 6.5 and 7.5. In other games,
> If B+ and 1000 playouts at final position is over +0.60, I use it.

How do you handle handicap games? I see you excluded them from the KGS
dataset. Can your value network deal with handicap?

At least in the KGS ruleset, handicap stones are added to the score
calculation, so it is required that the network knows the exact handicap.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-17 Thread Hiroshi Yamashita

Hi,

Aya reaches pro level on GoQuest 9x9 and 13x13.
Aya got highest rating in 9x9, and highest best rating in 13x13.
GoQuest is Go App for Android, iPhone and Browser.

In 9x9 and 13x13, Aya uses Policy network and Value network.
Policy net is same as 19x19.
It is trained by GoGoD 78000 games, using 8 symmetries, 120,000,000 positions.
It took one month with a GTX 980. Accuracy is 51.0%.
12 Layers, 128 Filters.
128 5x5 x1, 128 3x3 x10, 128 3x3 x1
Features are 49 channels.
http://computer-go.org/pipermail/computer-go/2016-February/008606.html
Network is fully convolution, so it can be used 9x9 and 13x13.
http://computer-go.org/pipermail/computer-go/2015-December/008324.html

DCNN without search is +580(19x1), +448(13x13) and +393(9x9) stronger than
GNU Go.(CGOS BayesElo)

9x9
DCNN_AyaF128a510x1  2193
Gnugo-3.7.10-a1  1800

13x13
DCNN_AyaF128a510x1  2248
Gnugo-3.7.10-a1  1800

19x19
DCNN_AyaF128a510x1  2380
Gnugo-3.7.10-a1  1800


Value Net is 32 Filters, 14 Layers.
32 5x5 x1, 32 3x3 x11, 32 1x1 x1, fully connect 256, fully connect tanh 1
Features are 50 channels.
http://computer-go.org/pipermail/computer-go/2016-March/008768.html
Learning positions are made by Aya's selfplay. 9x9 is 2,200,000 games,
13x13 is 1,000,000 games. 16 position are selected from one game.
9x9   is 2000 playout/move. komi 7.0. (CGOS 2290).
13x13 is  500 playout/move. Only root is created by Policy Net. komi 7.5. (CGOS 
2433).
In 9x9, opening book from GoQuest 8607 games is used.
In 13x13, first 16 moves are selected from Policy net probability.
http://computer-go.org/pipermail/computer-go/2016-March/008970.html

At first, I used playout winrate for training data. If 24 move's Black winrate
is 59%, set 0.59. But it is weaker than using game result 0 or 1.

Policy + Value vs Policy, 1000 playouts/move, 1000 games. 9x9, komi 7.0
0.634  using game result. 0 or 1
0.552  using game result. Cubic approximation.
0.625  using game result. Linear approximation.
0.641  using game result. 0 or 1, dropout, half, all layers
0.554  using playout winrate

Linear approximation is, if game ends 60 moves, and result is W win(0.0),
 then 30 moves position's value is (0.25).
Linear approximation reduces training loss though. (from 0.37 to 0.08.
19x19, B win +1.0, W win -1.0.)

Policy + Value vs Policy, 1000 playouts/move, 13x13, komi 7.5
0.735 1000 playouts/move, 994 games

Compared with 9x9, it seems stronger selfplay makes stronger value net.


I also made 19x19 Value net. 19x19 learning positions are from KGS 4d over,
GoGoD, Tygem and 500 playouts/move selfplay. 990255 games. 32 positions
are selected from a game. Like Detlef's idea, I also use game result.
I trust B+R and W+R games with komi 5.5, 6.5 and 7.5. In other games,
If B+ and 1000 playouts at final position is over +0.60, I use it.

Policy + Value vs Policy, 19x19, komi 7.5, Filter  32, Layer 14
0.640  1000 playouts/move, 995 games
0.654  1000 playouts/move, 500 games, explicit symmetry ensemble(Value net only)
0.635  1000 playouts/move, 818 games, Linear approximation

Policy + Value vs Policy, 19x19, komi 7.5, Filter 128, Layer 14
0.667   500 playouts/move, 501 games.
0.664  2000 playouts/move, 530 games.

Policy + Value vs Policy, 19x19, komi 7.5, Filter 128, Layler 14, using 2000 
playouts winrate
0.694  1000 playouts/move, 572 games
0.771 1 playouts/move, 332 games

Recently I found Black winrate is low in KGS games. Because there are
many komi 0.5 games, and in komi 0.5, White tends to win. Maybe I need
to reduce some White win games.

19x19 Black winrate 0.418, komi 7.5,  30,840,000 positions, GoGoD, KGD 4d, tygem
13x13 Black winrate 0.485, komi 7.5,  16,790,000 positions, selfplay, 500 
playout/move
9x9   Black winrate 0.514, komi 7.0,  33,760,000 positions, selfplay, 2000 
playout/move, draw is 0.5

Using Policy + Value(Filter 32), Aya reaches 7d on KGS.
Machine is W3680 3.3GHz, 6 cores, a GTX 980
 AyaMC 4d
 AyaMC 6d  with Policy
 AyaMC 7d  with Policy and Value, handicaps <= 3, no dynamic komi.


GoQuest ranking, Bot is not listed. "spaceman" is OHASHI Hirofumi 6p.
13x13   http://wars.fm/go13#users/0
 9x9   http://wars.fm/go9#users/0
AyaZBot http://wars.fm/go9#user/:ayazbot

Aya's GoQuest rating
   9x913x13
:AyaXBot   2322 2407   1 playout/move, only root node is Policy
:AyaZBot   2466 2361   year 2014
:AyaZBot   2647 2711   Policy+Value, W3680 3.3GHz, 6 core, a GTX 980
:CrazyStoneBot 2592year 2014

Note:
 GoQuest time setting is 5 minutes + add 3 sec/move in 13x13.
 Computers have an advantage on this setting.
http://computer-go.org/pipermail/computer-go/2015-December/008353.html


I wrote an article how to make Poilicy and Value network.
http://www.yss-aya.com/deep_go.html
I'm afraid it is in Japanese. But some of links are maybe useful.
http://www.yss-aya.com/20161004aya.zip
This includes Aya's network definition.

Open source
Ray
http://computer-go-ray.com/eng/index.html
Ray-nn, Ray with Policy and Value net, CGOS