Hi Russell, > On Thu, 2018-07-12 at 18:15 +0100, Ian Jackson wrote: > > Compare neural networks: a user who uses a pre-trained neural network > > is subordinated to the people who prepared its training data and set > > up the training runs. > > In Alpha-Zero's case (it is Alpha-Zero the original post was about) > there is no training data. It learns by being run against itself. > Intel purchased Mobileye (the system Tesla used to use, and maybe still > does) with largely the same intent. The training data in that case is > labelled videos resembling dash cam footage. Training the neural > network requires huge amounts of it, all of which was produced by > Mobileye by having human watch the video and label it. This was > expensive and eventually unsustainable. Intel said they were going to > attempt to train the network with videos produced by game engines. I > haven't seen much since the Intel purchased Mobileye however if they > succeed we are in the same situation - there is no training data. In > both cases is is just computers teaching themselves.
To be clear, there are mainly three types of learning: (1) supervised learning[1]; (2) unsupervised learning[2]; (3) reinforcement learning[3]. AlphaGo-Zero is based on reinforcment learning, but it is a bit special: we can generate meaningful data in the status space of checkboard. However, many other present reinforcement learning research use data that is not easy to generate by algorithm. For example, I remember some group of people tries to teach the neural network to drive the car by letting it play Grand_Theft_Auto_V[4]. Supervised learning (requires labeled data) and Unsupervised learning (requires unlabeled data) often require a bit amount of data. That data may come along with license restriction. As for reinforcement learning's data ...... well, I'm confused but I don't want to dig deeper. > The upshot is I don't think focusing on training data or the initial > weights is a good way to reason about what is happening here. If Deep > Mind released the source code for Alpha-Zero anyone could in principle > reproduce their results if you define their result as I'm pretty sure > they do: produce an AI capable of beating any other AI on the planet at > a particular game. The key words are "in principle" of course, because > the other two ingredients they used was 250 MW hour of power (a wild > guess on my part) and enough computers to be able to expend that in 3 > days. Releasing initial weight doesn't make sense. The initial weights of the state-of-the-art neural networks are simply drawn from a certain Gaussian distribution or a certain uniform distribution. The key to reproduce a neural network is input data + hyper-parameters, such as the learning rate used during gradient descent. > A better way to think about this is the AI they created is just another > chess (or Go or whatever) playing game, no different in principle to > chess games already in Debian. However, it's move pruning/scoring > engine was created by a non human intelligence. The programming > language that intelligence uses (the weights on a bunch of > interconnected polynomials) and the way it reasons (which is boils down > finding the minima of a high dimensional curve using newtons method to > slide down the slope) is not something human minds are built to cope > with. But even though we can't understand them these weights are the > source, as if you give them to a similar AI it can change the program. > In principle the DSFG is fulfilled if we don't discriminate again non- > human intelligences. > > Apart from the "non-human" intelligence bit none of this is different > to what we _already_ accept into Debian. It's very unlikely I could > have sensible contributions to the game engines of the best chess, > backgammon or Go programs Debian has now. I have no doubt I could > understand the source, but it would take me weeks / months if not years > to understand the reasoning that went into their move scoring engines. > The move scoring engine happens to be the exact thing Alpha-Zero's AI > (another thing I can't modify) replaces. In the case of chess at > least they will have a database of end games they rely on, a database > generated by brute force simulations generated using quantities of CPU > cycles I simply could not afford to do. > > Nonetheless, cost is an issue. To quantify it I presume they will be > able to rent the hardware required from a cloud provider - possibly we > could do that even now. But the raw cost of that 250 MW hour of power > is $30K, and I could easily imagine it doubling many times as it goes > through the supply chain so as another wild guess you are probably > looking at $1M to modify the program. $1M is certainly not "free" in > any sense of the word, but then the reality no other Debian development > is free either. All development requires computers and power which > someone has to pay for. The difference is now is merely one of a few > added noughts, and those noughts exclude almost all of us from working > on the source. But I'd be surprised if there isn't a Debian users out > there who *do* have the means to fiddle with these programs if they had > the weights and the source used to create them. Which means anyone > could work on them if they had the means - but I don't have the means. > *shrug* Yes, cost is a big issue. The point of my original post is exactly the "time cost". And sometimes, there is hardware cost too. > Which is how I reach the opposite conclusion to Ian. If Deep Mind > released Aplha-Zero source code under a suitable licence, plus some > example neural networks they generated with it (that happen to be bit > everyone uses) Debian rejecting the example networks as they "aren't > DFSG" free would be a mistake. I view one of our roles as advancing > free software, all free software. Rejecting some software because we > humans don't understand it doesn't match that goal. According to the previous discussion, the biggist two problems are: 1. license of big data 2. It's hard for a a user to modify or reproduce a work with pure-free software stack. Seems very hard to solve. The talk in dc12 that pabs pointed out raised nearly the same problem. Now it's 2018, 6 years have passed and it appears that there is no progress at this point. [1] https://en.wikipedia.org/wiki/Supervised_learning [2] https://en.wikipedia.org/wiki/Unsupervised_learning [3] https://en.wikipedia.org/wiki/Reinforcement_learning [4] https://en.wikipedia.org/wiki/Grand_Theft_Auto_V