Re: [Computer-go] Learning related stuff

Eric Boesch Wed, 29 Nov 2017 16:33:27 -0800

Could you be reading too much into my comment? AlphaGo Zero is an amazing
achievement, and I might guess its programmers will succeed in applying
their methods to other fields. Nonetheless, I thought it was interesting,
and it would appear the programmers did too, that before improving to
superhuman level, AlphaGo was temporarily stuck in a rut of playing
literally the worst first move on the board (excluding pass). That doesn't
mean I think I could do better.



On Tue, Nov 28, 2017 at 4:50 AM, uurtamo . <[email protected]> wrote:

> This is starting to feel like asking along the lines of, "how can I
> explain this to myself or improve on what's already been done in a way that
> will make this whole process work faster on my hardware".
>
> It really doesn't look like there are a bunch of obvious shortcuts. That's
> the whole point of decision-trees imposed by humans for 20+ years on the
> game; it wasn't really better.
>
> Probably what would be good to convince oneself of these things would be
> to challenge each assumption in divergent branches (suggested earlier) and
> watch the resulting players' strength over time. Yes, this might take a
> year or more on your hardware.
>
> I feel like maybe a lot of this is sour grapes; let's  please again
> acknowledge that the hobbyists aren't there yet without trying to tear down
> the accomplishments of others.
>
> s.
>
> On Nov 27, 2017 7:36 PM, "Eric Boesch" <[email protected]> wrote:
>
>> I imagine implementation determines whether transferred knowledge is
>> helpful. It's like asking whether forgetting is a problem -- it often is,
>> but evidently not for AlphaGo Zero.
>>
>> One crude way to encourage stability is to include an explicit or
>> implicit age parameter that forces the program to perform smaller
>> modifications to its state during later stages. If the parameters you copy
>> from problem A to problem B also include that age parameter, so the network
>> acts old even though it is faced with a new problem, then its initial
>> exploration may be inefficient. For an MCTS based example, if a MCTS node
>> is initialized to a 10877-6771 win/loss record based on evaluations under
>> slightly different game rules, then with a naive implementation, even if
>> the program discovers the right refutation under the new rules right away,
>> it would still need to revisit that node thousands of times to convince
>> itself the node is now probably a losing position.
>>
>> But unlearning bad plans in a reasonable time frame is already a feature
>> you need from a good learning algorithm. Even AlphaGo almost fell into trap
>> states; from their paper, it appears that it stuck with 1-1 as an opening
>> move for much longer than you would expect from a program probably already
>> much better than 40 kyu. Even if it's unrealistic for Go specifically, you
>> could imagine some other game where after days of analysis, the program
>> suddenly discovers a reliable trick that adds one point for white to every
>> single game. The effect would be the same as your komi change -- a mature
>> network now needs to adapt to a general shift in the final score. So the
>> task of adapting to handle similar games may be similar to the task of
>> adapting to analysis reversals within a single game, and improvements to
>> one could lead to improvements to the other.
>>
>>
>>
>> On Fri, Nov 24, 2017 at 7:54 AM, Stephan K <[email protected]>
>> wrote:
>>
>>> 2017-11-21 23:27 UTC+01:00, "Ingo Althöfer" <[email protected]>:
>>> > My understanding is that the AlphaGo hardware is standing
>>> > somewhere in London, idle and waitung for new action...
>>> >
>>> > Ingo.
>>>
>>> The announcement at
>>> https://deepmind.com/blog/applying-machine-learning-mammography/ seems
>>> to disagree:
>>>
>>> "Our partners in this project wanted researchers at both DeepMind and
>>> Google involved in this research so that the project could take
>>> advantage of the AI expertise in both teams, as well as Google’s
>>> supercomputing infrastructure - widely regarded as one of the best in
>>> the world, and the same global infrastructure that powered DeepMind’s
>>> victory over the world champion at the ancient game of Go."
>>> _______________________________________________
>>> Computer-go mailing list
>>> [email protected]
>>> http://computer-go.org/mailman/listinfo/computer-go
>>>
>>
>>
>> _______________________________________________
>> Computer-go mailing list
>> [email protected]
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://computer-go.org/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Learning related stuff

Reply via email to