Could you be reading too much into my comment? AlphaGo Zero is an amazing achievement, and I might guess its programmers will succeed in applying their methods to other fields. Nonetheless, I thought it was interesting, and it would appear the programmers did too, that before improving to superhuman level, AlphaGo was temporarily stuck in a rut of playing literally the worst first move on the board (excluding pass). That doesn't mean I think I could do better.
On Tue, Nov 28, 2017 at 4:50 AM, uurtamo . <[email protected]> wrote: > This is starting to feel like asking along the lines of, "how can I > explain this to myself or improve on what's already been done in a way that > will make this whole process work faster on my hardware". > > It really doesn't look like there are a bunch of obvious shortcuts. That's > the whole point of decision-trees imposed by humans for 20+ years on the > game; it wasn't really better. > > Probably what would be good to convince oneself of these things would be > to challenge each assumption in divergent branches (suggested earlier) and > watch the resulting players' strength over time. Yes, this might take a > year or more on your hardware. > > I feel like maybe a lot of this is sour grapes; let's please again > acknowledge that the hobbyists aren't there yet without trying to tear down > the accomplishments of others. > > s. > > On Nov 27, 2017 7:36 PM, "Eric Boesch" <[email protected]> wrote: > >> I imagine implementation determines whether transferred knowledge is >> helpful. It's like asking whether forgetting is a problem -- it often is, >> but evidently not for AlphaGo Zero. >> >> One crude way to encourage stability is to include an explicit or >> implicit age parameter that forces the program to perform smaller >> modifications to its state during later stages. If the parameters you copy >> from problem A to problem B also include that age parameter, so the network >> acts old even though it is faced with a new problem, then its initial >> exploration may be inefficient. For an MCTS based example, if a MCTS node >> is initialized to a 10877-6771 win/loss record based on evaluations under >> slightly different game rules, then with a naive implementation, even if >> the program discovers the right refutation under the new rules right away, >> it would still need to revisit that node thousands of times to convince >> itself the node is now probably a losing position. >> >> But unlearning bad plans in a reasonable time frame is already a feature >> you need from a good learning algorithm. Even AlphaGo almost fell into trap >> states; from their paper, it appears that it stuck with 1-1 as an opening >> move for much longer than you would expect from a program probably already >> much better than 40 kyu. Even if it's unrealistic for Go specifically, you >> could imagine some other game where after days of analysis, the program >> suddenly discovers a reliable trick that adds one point for white to every >> single game. The effect would be the same as your komi change -- a mature >> network now needs to adapt to a general shift in the final score. So the >> task of adapting to handle similar games may be similar to the task of >> adapting to analysis reversals within a single game, and improvements to >> one could lead to improvements to the other. >> >> >> >> On Fri, Nov 24, 2017 at 7:54 AM, Stephan K <[email protected]> >> wrote: >> >>> 2017-11-21 23:27 UTC+01:00, "Ingo Althöfer" <[email protected]>: >>> > My understanding is that the AlphaGo hardware is standing >>> > somewhere in London, idle and waitung for new action... >>> > >>> > Ingo. >>> >>> The announcement at >>> https://deepmind.com/blog/applying-machine-learning-mammography/ seems >>> to disagree: >>> >>> "Our partners in this project wanted researchers at both DeepMind and >>> Google involved in this research so that the project could take >>> advantage of the AI expertise in both teams, as well as Google’s >>> supercomputing infrastructure - widely regarded as one of the best in >>> the world, and the same global infrastructure that powered DeepMind’s >>> victory over the world champion at the ancient game of Go." >>> _______________________________________________ >>> Computer-go mailing list >>> [email protected] >>> http://computer-go.org/mailman/listinfo/computer-go >>> >> >> >> _______________________________________________ >> Computer-go mailing list >> [email protected] >> http://computer-go.org/mailman/listinfo/computer-go >> > > _______________________________________________ > Computer-go mailing list > [email protected] > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list [email protected] http://computer-go.org/mailman/listinfo/computer-go
