Hi Ian, Yes, it would be great to improve the playing level of gnubg once more. Here are (yet again) my thoughts and comments.
The net was not trained from the rollout results, but by using 2 ply evaluations. The best I could come up with resulted from the choice which positions to include in the training set. And for my particular training method, more was frequently not better. My personal view is that to move past the next step we need a new method to generate an evaluation net. this requires some thought and research by someone other than me, as I am set in my ways. The few BGBlitz games I saw (the gnubg Blitz ones) seem to indicate BGBlitz moves it's checkers at the same level, while gnubg cube handling is slightly better. when zbot hits FIBS we will how it compares. There is one thing that could have helped, and that is getting Harald Wittman to contribute his mloner net. if there is a bot in my mind which has a more than even chance of being stronger than gnubg it is mloner. -Joseph On 7/18/06, Ian Shaw <[EMAIL PROTECTED]> wrote:
Astonishingly, it's been about three years since version 0.14 of gnubg was released. It has proved to be superior to JellyFish and at least the equal of Snowie 4. Since then, BgBlitz has arrived as a serious opponent, and rumours of Z-bot's approach persist. If it ever arrives, I'm sure it will be a strong player. I think we've rested on our laurels long enough, and it's about time we started trying to improve the playing strength of our favourite bot. I can think of several ways where might seek to make improvements: A) Speed up the evaluation function so gnubg can search faster, and maybe deeper. B) Improve the evaluation function by changing the neural net inputs or hidden nodes. C) Retrain the existing net using a new set of training positions. D) Retrain the existing net using newer rollouts of the current set of training positions. I'm keen to discuss A, B and C, but this post is going to focus on the last method. If this broadens into a far-reaching discussion, I think it will help to keep the themes separate. Even if A or B prove to offer the biggest benefits, improving the training database will be advantageous, so the work won't go to waste. CURRENT TRAINING DATABASE I will summarise the current state of play, as far as I understand it. Please correct me if I'm wrong. We have a large set of positions rolled out 1296 times at 0-ply. The positions were rolled out using the 0.13 weights. This position database was then used by Joseph Heled to train the neural network, leading to the version 0.14 weights that we currently use. The positions were chosen from the following sources: Games recorded on FIBS Positions generated by gnubg playing against itself Positions were included in the database if the 0-ply evaluation disagrees with the 2-ply evaluation, indicating that gnubg does not understand the position well. The position database is divided into the following three categories, and subdivided into numbered files to enable the work to be shared: Race 0000 - 0046: Contact has been broken; both players are simply trying to race around the board and bear off as fast as possible. Crashed 0000 - 0085: Contact positions where one side has crashed, with several men on the first 2 or 3 points. Grand-Pos 0000 - 0150: More crashed positions. Doubles: The doubles database includes crashed positions which have a forced move or no move (so there can not be a discrepancy between plies). Contact 0000 - 0108: The general state of play where there is still contact but the position is not crashed. More information can be found on Joseph Heled's pages, http://pages.quicksilver.net.nz/pepe/ngb/index-top.html. RETRAINING THE EXISTING NET We used gnubg 0.13 to generate the current database, giving us the training data to produce version 0.14. I propose to update this database by re-rolling it using version 0.14. This will give us data to enable us to produce version 0.15. Since gnubg 0.14 is already very strong, I would expect only an small improvement, at best, but I think it's an obvious place to start. I need some HELP here. 1) Firstly, I need the 0.14 weights translated into a format that the rollout programme "sagnubg" can understand. This is a text file of floating point numbers, and is not in the same format as the gnubg.wd file. I have sagnubg030101, which I assume is the latest version. 2) I don't have all the training database data. I've still got the ones I rolled out, but there is a large amount missing. Hopefully Joseph can send me the lot, but just in case, please could you send me any data you have if you were part of the rollout team. 3) I don't know how to train the NN once the rollout is done. Joseph used his own program external to gnubg. I've no idea how much work is involved at this stage. Perhaps Joseph is willing to have another go, or teach me what to do. 4) Anyone who wants to help by rolling out positions is more than welcome. Summer's here and people are going on holiday, leaving lots of PCs looking for something to do. If you have a PC or two that will be idle for a while, why not set it to work. If you do have more than one networked PC, I have some DOS batch files that (crudely) co-ordinate the work among several PCs. 5) What order should these be attacked in? I propose to start with the Contact positions. The Race net is already very strong, and I think Joseph struggled to improve the Crashed net performance. GNUBG'S ODD-EVEN EFFECT It has been observed on numerous occasions that gnubg's even ply evaluations agree with each other more than they agree with the interleaved odd-ply evaluations. That is, 0- and 2-ply tend to agree with each other, as do 1- and 3-ply. This is caused by the evaluation function always looking from the point of view of player about to play. At even plies, it tries to maximize the player's equity, whilst at odd plies it tries to maximize the opponent's equity - thus minimizing the equity of the original player. Since gnubg tries to maximise the equity at each ply, it will tend to pick moves that are overvalued at that depth, leading to the swings we see between odd and even plies. I have an idea that might mitigate this tendency. I wonder if it would be beneficial to invert all the positions and equities in the rollouts. This would give us the rollout data for each complementary position. We would effectively double the size of the rollout database for almost no effort. I can think of two potential drawbacks. 1) It would increase the training time. Is training time linearly proportional to database size, or some exponential function such as the square of the database size? 2) We would have the same data twice, presented in different formats. This might encourage the NN to train to "fit" the data in the database, whereas we are looking to generalize the evaluation function over the entire position class. Nis Jorgenson and Joseph Heled investigated the idea of combining odd and even ply evaluations to produce a more accurate evaluation. The results were positive, see http://lists.gnu.org/archive/html/bug-gnubg/2003-02/msg00218.html, but they were not incorporated into gnubg. I don't know why not, possibly due to the overhead of combining information from two plies. I'm wondering if my idea might have some the benefits of their idea in that it considers both sides of a position, but does it at the training stage where it is a one-off cost in processor power. I'd be interested in all comments. I'd particularly like to get some help from Øystein or Joseph to get me started - I go on holiday in two weeks and I'd like to leave my PC busy. Regards, Ian Shaw _______________________________________________ Bug-gnubg mailing list Bug-gnubg@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnubg
_______________________________________________ Bug-gnubg mailing list Bug-gnubg@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnubg