Re: [Computer-go] AlphaZero tensorflow implementation/tutorial

cody2007 via Computer-go Sun, 09 Dec 2018 19:53:09 -0800

Sorry, just to make sure I understand: your concern is the network may be 
learning from the scoring system rather than through the self-play? Or are you 
concerned the scoring is giving sub-par evaluations of games?


The scoring I use is to simply count the number of stones each player has on 
the board. Then add a point for each unoccupied space that is surrounded 
completely by each player. It is simplistic and I think it does give sub-par 
evaluations of who is the winner--and definitely is a potentially serious 
deterrent to getting better performance. How much, maybe a lot. What do you 
think?

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, December 9, 2018 9:31 PM, uurtamo <uurt...@gmail.com> wrote:

> Imagine that your score estimator has a better idea about the outcome of the 
> game than the players themselves.
>
> Then you can build a stronger computer player with the following algorithm: 
> use the score estimator to pick the next move after evaluating all legal 
> moves, by evaluating their after-move scores.
>
> If you use something like Tromp-Taylor (not sure what most people use 
> nowadays) then you can score it less equivocally.
>
> Perhaps I was misunderstanding, but if not then this could be a somewhat 
> serious problem.
>
> s
>
> On Sun, Dec 9, 2018, 6:17 PM cody2007 <cody2...@protonmail.com wrote:
>
>>>By the way, why only 40 moves? That seems like the wrong place to economize, 
>>>but maybe on 7x7 it's fine?
>> I haven't implemented any resign mechanism, so felt it was a reasonable 
>> balance to at least see where the players roughly stand. Although, I think I 
>> errored on too few turns.
>>
>>>A "scoring estimate" by definition should be weaker than the computer 
>>>players it's evaluating until there are no more captures possible.
>> Not sure I understand entirely. But would agree that the scoring I use is 
>> probably a limitation here.
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On Sunday, December 9, 2018 8:51 PM, uurtamo <uurt...@gmail.com> wrote:
>>
>>> A "scoring estimate" by definition should be weaker than the computer 
>>> players it's evaluating until there are no more captures possible.
>>>
>>> Yes?
>>>
>>> s.
>>>
>>> On Sun, Dec 9, 2018, 5:49 PM uurtamo <uurt...@gmail.com wrote:
>>>
>>>> By the way, why only 40 moves? That seems like the wrong place to 
>>>> economize, but maybe on 7x7 it's fine?
>>>>
>>>> s.
>>>>
>>>> On Sun, Dec 9, 2018, 5:23 PM cody2007 via Computer-go 
>>>> <computer-go@computer-go.org wrote:
>>>>
>>>>> Thanks for your comments.
>>>>>
>>>>>>looks you made it work on a 7x7 19x19 would probably give better result 
>>>>>>especially against yourself if you are a complete novice
>>>>> I'd expect that'd make me win even more against the algorithm since it 
>>>>> would explore a far smaller amount of the search space, right?
>>>>> Certainly something I'd be interested in testing though--I just would 
>>>>> expect it'd take many months more months of training however, but would 
>>>>> be interesting to see how much performance falls apart, if at all.
>>>>>
>>>>>>for not cheating against gnugo, use --play-out-aftermath of gnugo 
>>>>>>parameter
>>>>> Yep, I evaluate with that parameter. The problem is more that I only play 
>>>>> 20 turns per player per game. And the network seems to like placing 
>>>>> stones in terrotories "owned" by the other player. My scoring system then 
>>>>> no longer counts that area as owned by the player. Probably playing more 
>>>>> turns out and/or using a more sophisticated scoring system would fix this.
>>>>>
>>>>>>If I don't mistake a competitive ai would need a lot more training such 
>>>>>>what does leela zero https://github.com/gcp/leela-zero
>>>>> Yeah, I agree more training is probably the key here. I'll take a look at 
>>>>> leela-zero.
>>>>>
>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>> On Sunday, December 9, 2018 7:41 PM, Xavier Combelle 
>>>>> <xavier.combe...@gmail.com> wrote:
>>>>>
>>>>>> looks you made it work on a 7x7 19x19 would probably give better result 
>>>>>> especially against yourself if you are a complete novice
>>>>>>
>>>>>> for not cheating against gnugo, use --play-out-aftermath of gnugo 
>>>>>> parameter
>>>>>>
>>>>>> If I don't mistake a competitive ai would need a lot more training such 
>>>>>> what does leela zero https://github.com/gcp/leela-zero
>>>>>>
>>>>>> Le 10/12/2018 à 01:25, cody2007 via Computer-go a écrit :
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I've posted an implementation of the AlphaZero algorithm and brief 
>>>>>>> tutorial. The code runs on a single GPU. While performance is not that 
>>>>>>> great, I suspect its mostly been limited by hardware limitations (my 
>>>>>>> training and evaluation has been on a single Titan X). The network can 
>>>>>>> beat GNU go about 50% of the time, although it "abuses" the scoring a 
>>>>>>> little bit--which I talk a little more about in the article:
>>>>>>>
>>>>>>> https://medium.com/@cody2007.2/alphazero-implementation-and-tutorial-f4324d65fdfc
>>>>>>>
>>>>>>> -Cody
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Computer-go mailing list
>>>>>>> Computer-go@computer-go.org
>>>>>>>
>>>>>>> http://computer-go.org/mailman/listinfo/computer-go
>>>>>
>>>>> _______________________________________________
>>>>> Computer-go mailing list
>>>>> Computer-go@computer-go.org
>>>>> http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaZero tensorflow implementation/tutorial

Reply via email to