Re: [computer-go] GPGPU
Hi Petr, I missed this posting yours at feb 10. On Feb 10, 2009, at 1:44 AM, Petr Baudis wrote: Hi! There has been some talk about implementing monte-carlo playouts on GPUs in the past, I have heard rumours about Polish bachelor student doing libego - GPGPU conversion as a project, etc. but I know of nothing concrete ever materializing - is anybody aware of anything? Of course as we will all realize a lot of people have tried all kind of stuff on gpgpu already and obviously at first at nvidia. For chess at the moment it's more complicated than go as a shared hashtable is real important in chess. This will come in go also as being real important. Whether the current generation nvidia or amd can reach the optimum of the claimed performance is of course interesting question today. However it's not so relevant in longterm, as what is real relevant is the question: what is theoretical possible here? We can simply assume that future generations improve. Be it nvidia, be it AMD, be it some other manufacturer (intel?). with some extra overhead it's quite possible to have a shared hashtable. All those things i solved here on paper. The real interesting thing is of course things like branches. How to replace them with near to branchless code? That's not so easy. That simply requires overhead. In case of pattern matching, it's obvious that in x86 you can skip a lot of patterns easily cheap by having a single compare that fails. So that's a cycle or 30 you lose maximum. Let's say probably a lot LESS cycles. 15 or so. Not in gpgpu. However as a compensation gpgpu's can push through more instructions. Theoretical we can speak about a quadcore say i7 965: 3.2Ghz * 4 instructions a cycle * 4 cores = 51.2 G instructions per second = 51.2GIPS AMD : 320 cores @ 5 execution units = 1600 execution units * 0.97Ghz * 1 instruction a cycle = 0.97 * 1600 = 1552 GIPS NVIDIA : 480 streamcores * 1.5Ghz * 1 instruction a cycle = 720 GIPS note sometimes nvidia counts double a cycle as there is multiply add in floating point, so i don't want to say that AMD is superior here to nvidia or vice versa, as it is a total different architecture to start with and obviously AMD is not capable of doing multiplication with all 5 units of each full core; so the gflops count is something total different here than what we do namely game tree search. For me a factor 2 here is really not interesting to discuss, so i really don't want to get into a discussion which of the 2 architectures is faster for game tree search at this moment. I might when i get sponsored by one of the manufacturers though :) The interesting thing is that to AMD the fastest intel quadcore chip is say factor 30 slower on paper if we look to the raw GFLOP. For all kind of gflop researches we know that the x86's really get far over 90% efficiency and the best claim ever of serious software that runs in huge numbers at gpu's, they claim 40% efficiency. Scientists who claim therefore more than say factor 15 speed difference of a gpu versus x86, they are not objective; if not committing complete fraud on paper. To my estimate it would be possible to do things at a factor 5 loss. So you're 6 times faster then than a quadcore. As i like numbers that are a multiple of 5 in this case, i round it down to 5. Whatever complex calculation i do taking into account all factors, after months of study over a period of more than 2.5 nearly 3 years now, i each time come down to factor 5, already for a number of years. Achieving this is not simple. Let's first look whether it is possible to also get a memory bandwidth of factor 5. If we look to memory bandwidth this factor 6 you can sustain realtime and nonstop 24/24 hours a day, as the memory bandwidth claimed of the i7 is say 20-30GB/s and the topend gpu's now definitely are over factor 5 faster in bandwidth. AMD has DDR5 there (not sure nvidia already has, but if not then they'll follow soon), so to speak the gpu's have 2 generations newer RAM than the quadcores. Just the difference in RAM already means factor 4 faster for the gpu's, multiply to that more memory controllers that they use than the cpu's and fact that a lot gets done in a decentral manner at gpu's whereas in cpu's there is a centralized L3 cache to the RAM. So bandwidth wise seen it is quite possible to be practical 5 times faster or so up to 10 times if you're a bit more optimistic. However to get it out of the gpu's is not so easy for game tree search. Optimizing low level is already really complicated. Here you also need 2 layer parallel model at least and figure out exactly where the chip can get a high IPC (number of instructions per cycle) and how to get a high IPC done. We have had now say 30 years of optimizations for processors in a sequential manner with branches. Programming code that's with nearly no branches is a total different league. Using the
Re: [computer-go] CUDA and GPU Performance
On Sep 13, 2009, at 10:19 AM, Petr Baudis wrote: On Sun, Sep 13, 2009 at 01:02:40AM +0200, Vincent Diepeveen wrote: On Sep 10, 2009, at 12:55 AM, Michael Williams wrote: Very interesting stuff. One glimmer of hope is that the memory situations should improve over time since memory grows but Go boards stay the same size. Well you first have to figure out how fast or slow shifting is on the nvidia's before actually going for it :) Just read the nVidia docs. Shifting has the same cost as addition. Document number and url? P.S.: The PTX assembler is also documented. Or have you meant some secret undocumented instruction goodies? Petr Pasky Baudis ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] CUDA and GPU Performance
A document of 2 weeks ago where they write at least *something*, not bad from nvidia, knowing they soon have to give lessons to topcoders :) It's not really systematic approach though. We want a list of all instructions with latencies and throughput latency that belong to it. Also lookup times to the caches, shared memory and RAM would be great to know, even when it would only be bandwidth numbers. I do see references to sections B and C for a multiplication instruction and memory fetch instruction that seems to exist, but can't find that section at all. Yet nowhere in document i see which hardware instructions the Nvidia hardware supports. Mind giving page number? Vincent On Sep 13, 2009, at 11:43 AM, Petr Baudis wrote: On Sun, Sep 13, 2009 at 10:48:12AM +0200, Vincent Diepeveen wrote: On Sep 13, 2009, at 10:19 AM, Petr Baudis wrote: Just read the nVidia docs. Shifting has the same cost as addition. Document number and url? http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/ NVIDIA_CUDA_Programming_Guide_2.3.pdf P.S.: The PTX assembler is also documented. Or have you meant some secret undocumented instruction goodies? http://www.nvidia.com/object/io_1213955209837.html -- Petr Pasky Baudis A lot of people have my books on their bookshelves. That's the problem, they need to read them. -- Don Knuth ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] CUDA and GPU Performance
Thanks for sharing this Christian, in my lines comments. On Sep 9, 2009, at 5:54 PM, Christian Nentwich wrote: I did quite a bit of testing earlier this year on running playout algorithms on GPUs. Unfortunately, I am too busy to write up a tech report on it, but I finally brought myself to take the time to write this e-mail at least. See bottom for conclusions. For performance testing, I used my CPU board representation, and a CUDA port of the same (with adjustments), to test the following algorithm: - Clear the board - Fill the board according to a uniform random policy - Avoid filling eyes, according to simple neighbour check - Avoid simple ko - Count the score and determine the winner In other words: no tree search is involved, and this is the lightest possible playout. The raw numbers are as follows: - CPU Search: 47,000 playouts per CPU core per second, on an Intel 6600 Core-2 Duo - GPU Search: 170,000 playouts per second, on an NVidia Geforce 285 card Interesting to know is how many nodes per second this is that hammers to the local shared memory and which IPC you overall had. This 9% might not be very accurate read further for that. nvidia is of course the weakest of all GPU manufacturers currently if you run on the gpu's rather than the tesla. Nvidia is of course most famous as they were the first with gpgpu for the big masses and promoting this. Note in 90s i already stumbled upon implementations of chessprograms at graphics cards by hardware designers who did do this as a hobby, at the time they were higher clocked than cpu's in many occasions. Nvidia is not really revealing what instructions the gpu's have, so this is a major bottleneck in your software program probably. To start with it is quite possible that entire 'gflops' calculation of nvidia is totally based upon combined instructions, such as multiplyadd. So streamcore in principe can do at maximum 1 instruction a cycle, yet the numbers Nvidia slams around with are based upon things that are counted as 2 instructions (so 8 flops) whereas in reality it is 1 instruction at the gpu. If you're not using this in the program then that already hammers down theoretical gains one can theoretical achieve at nvidia. Other manufacturers seem more open here. Note that privately approaching Nvidia also didn't help. It was for some potential pilot project a while ago for simulation software (simulation of whatever generics up to fighter jet software). Obviously a pilot is a pilot project and not some massive deal. Nvidia indicated only wanting to reveal *perhaps* something in case of a paper deal of really a lot of 'tesla' cards. As we know those are a bit pricey (not for the simulator clients), so a deal of hundreds or thousands of pre-ordered cards, that would not work out of course. A pilot project is there to convince something is possible. We know that the bandwidth from the gpu's for gpgpu work is a lot worse than from the tesla versions of nvidia, which is throughout your story the problem. However i see solutions there. Also your result is not so bad. Most likely you wrote quite efficient software in CUDA. So basically what so to speak limited your nps is the fact that your software program probably is doing very little work a node. When using more knowledge, it is quite possible that you hardly slowdown, as latency to the memory is simply what you keep hitting. So maybe you can make the program lossless more knowledgeable. Also what i am missing fro myour side is some technical data with respect to how many bytes you lookup in shared RAM for each streamcore at each node. For example, i might be confusing device RAM here, i thought cacheline size is 256 bytes per read to each core. So if you use considerable less than that, it's seriously slower of course. The algorithm running on the GPU is a straight port, with several optimisations then made to severely restrict memory access. This means the algorithm is a naive sort of parallel algorithm, parallel on a per-board level like the CPU implementation, rather than per-intersection or some other sort of highly parallel algorithm. Memory access other than shared processor memory carries a severe penalty on the GPU. Instead, all threads running on the GPU at any one time have to make do with a fast shared memory of 16834 bytes. So: - The board was compressed into a bit board, using 2*21 unsigned ints per thread 2 * 21 * 32 = 1344 bits Maybe it is the case the gpu is very ugly slow in bit manipulations, these things have not been designed to do some sort of squared cryptography. - The count of empty, white and black intersections and the ko position was also in shared memory per thread it doesn't speed up near the edges to do things dirty cheap and illegal and just gamble that illegal results don't backtrack to root ? (this as their playstrength is real weak yet)
Re: [computer-go] CUDA and GPU Performance
On Sep 9, 2009, at 11:57 PM, Christian Nentwich wrote: Mark, let me try to add some more context to answer your questions. When I say in my conclusion that it's not worth it, I mean it's not worth using the GPU to run playout algorithms of the sort that are in use today. There may be many other algorithms that form part of Go engines where the GPU can provide an order-of-magnitude speedup. Still more where the GPU can run in parallel with the CPU to help. In my experiments, a CPU core got 47,000 playouts per second and the GPU 170,000. But: - My computer has two cores (so it gets 94,000 playouts with 2 threads) later generion quadcores hardly have a higher ipc than core2. the core2 dual has a brilliant IPC. Nehalem is hardly faster than phenom2 nor core2 for diep. It's really the compiler quality and tricks with turboboost that lure the audience (like the experimental testmachine at testsites gets cooled down to far under 20C, entire machine all components, as there is a powerincrease of 10% when moving from 25C to 50C, ever seen at home a machine that's cooler than 50C? Additionally the turboboost gets manually overclocked/put to +600Mhz and the RAM is a type of RAM you can't realy afford bla bla bla). Of course this is multibillion companies and every single one of them tries to outdo another one to look better. So really you should compare it 1 to 1 powerwise. the gtx285 then is not so impressive. It's on par with quadcore nehalems in terms of gflops per watt. I wouldn't say it's an outdated gpu, as it is a fast gpu, but for gpgpu it obviously is slow. The latest AMD gpu is however 4 times better here. So your result is maximum factor 2 off for the core2 playouts there at a chip that in other areas is on par with Nehalem. You beat it factor 2 there. 8 core machines is not a fair compare as those have 2 sockets. So you should compare that with the 4 tesla setup, it has 960 streamcores. The only fair Nvidia compare with quadcores is when using tesla. Now i realize it is like $1800 a piece nearly, which is a lot for a GPU on stereoids, yet that's a fair compare to be honest. If we compare things let's compare fair. A 8 core nehalem is the maximum number of cores intel can deliver single machine as a fast machine. I'm skipping the single memory controller 24 core box now from a year ago (dunnington). The 8 core setup you really should compare with the Tesla times 4 cpu's so that's 960 cores. In reality you take a 300 euro card now. What's there for 300 euro from intel or AMD, not a 3.2Ghz i7-965 that's for sure, as that thing has a cost of 1000+ euro. So effectively you lose a factor 2 at most, your thing at the nvidia is still scaling better then. As for the parallel speedup one would get out of game tree search with so many threads versus 4 fast cores, this is a reality. Yes that's not so efficient yet. However there is a solution there to get a good speedup (at least for chess) that i figured out on paper. If there is 1 solution i bet there is more and also solutions for computer-go. The problem as always is getting funded to carry something out like that, as software on a gpu doesn't sell of course. - My computer's processor (intel core duo 6600) is 3 years old, and far from state of the art - My graphics card (Geforce 285) on the other hand, is recently purchased and one of the top graphics cards That means that my old CPU already gets more than twice the speed of the GPU. An Intel Nehalem processor would surely beat it, let alone an 8-core system. Bearing in mind the severe drawbacks of the GPU - these are not general purpose processors, there is much you can't do on them - this limits their usefulness with this algorithm. Compare this speedup to truly highly parallel algorithms: random number generation, matrix multiplication, monte- carlo simulation of options (which are highly parallel because there is no branching and little data); you see speedups of 10x to 100x over the CPU with those. matrixmultiplication is not THAT easy to solve well. Initial attempts were 20% efficient at Nvidia gpu's when using faster approximations (so not stupid simple manner which is ugly slow, but FFT wise), and that was for ideal sized matrice. So if in the lab it already is that inefficient that means there is problems everywhere. it's maturing rapidly now however. The 9% occupancy may be puzzling but there is little that can be done about that. This, and the talk about threads and blocks would take a while to explain, because GPUs don't work like general purpose CPUs. They are SIMD processors meaning that each processor can run many threads in parallel on different items of data but only if *all threads are executing the same instruction*. There is only one instruction decoding stage per processor cycle. If any if statements or loops diverge, threads will be
Re: [computer-go] CUDA and GPU Performance
On Sep 10, 2009, at 12:55 AM, Michael Williams wrote: Very interesting stuff. One glimmer of hope is that the memory situations should improve over time since memory grows but Go boards stay the same size. Well you first have to figure out how fast or slow shifting is on the nvidia's before actually going for it :) Remember this is graphics processors, not CPU's. Even at some cpu i remember the p4-prescott it was more than or equal to 7 cycles to shift right. Vincent ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Rules for remote play at the Computer Olympiad
Actually in computerchess it happens just sometimes and just by 1 team it has been done very clearly and that team is not from Europe yet from Middle East / Asia. The odds of an Asian cheating, someone who hardly makes enough cash to even pay for some basic things, are quite bigger than that someone from Western Europe, with on average a salary (for example in Netherlands) of 52000 euro a year for an IT guy, is going to cheat. The Asian has nothing to lose and even a 1000 dollars worth of sales in total he's happy with (and they all will be overestimating what you can make with computer- go with a tiny product). So except for this non-european team, I'm rather convinced that all the other teams past years were pretty clean. As a good chessplayer it's easy to see when someone cheats, in go that's even going to be easier. Considering how little progress the go program authors have made, other than importing computer chess authors, and considering the Asian problem of people who have nothing to lose, cheating is going to be a much bigger issue. The huge difference between computer-go and computerchess is obviously not only the level of the software, but especially the time. It is 2009 now. There is a range of new possibilities now to cheat. Either with or without remote machines. Furthermore behind the public naked eye, they learn from the computerchess guys how one can cheat without getting detected. So the only manner to detect cheats is in the method i described. Biggest hidden issue in computer games, both chess and go is probably stolen source code that gets used by some teams, rewritten to their own datastructure. Note that statistical seen in computerchess, considering the money that was at stake for some, there have been very few cheats at events; in normal sports with normal people that are a tad less clever, the amount of dope that gets used is a lot bigger. In fact in a normal sport without ANY form of dope you don't even get in top1000. Vincent On Apr 4, 2009, at 2:07 PM, Don Dailey wrote: On Sat, 2009-04-04 at 06:14 -0400, steve uurtamo wrote: Moreover, this is a really complicated issue. Yes, and I think cheating will always be possible. It's like cryptography, nothing is ever unbreakable. I was quite appalled at how often it happened in computer chess when I was active in it, and there were also incidents of humans using computers and having the moves transmitted to them. And of course in correspondence chess I think they had to allow computers because the honest players were at a disadvantage. - Don There has been some extensive statistical work on human cheating in chess done by Ken Regan at the University at Buffalo. However, this relies heavily upon the fact that computers dominate human play by a wide margin. The same is not the case in go. s. On Sat, Apr 4, 2009 at 1:56 AM, Robert Jasiek jas...@snafu.de wrote: Vincent Diepeveen wrote: If a program under no circumstance can reproduce a specific move and that for several occasions, then that's very clear proof of course. [...] Statistics prove everything here. No. Rather it proves that the program cheats OR that the methods of detecting cheating are improper. One always must have a logfile Good. -- robert jasiek ___ computer-go mailing list computer-go@computer-go.org http://www.computer- go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer- go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: GCP on ICGA Events 2009 in Pamplona
On Jan 14, 2009, at 1:42 PM, Mark Boon wrote: It's difficult to get hard data about this. Go is only the most popular game in Korea. In other countries like Japan and China it comes second by far to a local chess variation. Possibly Chess is more ingrained in Western culture than Go is in Asia, I don't know really. But Chess has the population-numbers of West vs. East against it. If there are more chess-players than Go- players in the world then it won't be by much. But the Go market is probably a lot bigger. Look only at the money in professional Go tournaments. It's probably an order of magnitude more than the money in professional Chess. But I must admit this is just a guess of mine. Mark Oh la la, The origin of chess and go isn't far away from each. Chess originates from India, go not far away from there, if you look at it from a global perspective. Both go and chess are really similar in that they are symmetric games. From strong player perspective there isn't that much difference in the game in intellectual experience. A good chessplayer can be a good go player and vice versa. Quite the opposite with the zillions of checkers versions there are. Checkers is NOT a symmetric game. If Chess would get added again to the olympic sports (which i doubt happens, but you never know how political decision taking takes place) that would be good news for China's women team. Maybe they lose on board 1 sometimes, but the other boards they'll win all games. Chess is becoming really big in China now (heh i'm still looking for a girlfriend, know a Chinese female go or chessplayer?) I'm quite sure chess is by now bigger in China than go there. Of course the step from Chinese chess to chess is real real tiny as well. Chess gets played in every nation on the planet. Tiniest chess is probably in Japan. Shogi and go seem to be more popular there. South Korea used to be real tiny also for chess, a new initiative there might boost it a tad more. Chess' advantage is of course the fact that the game is a lot quicker than go. Now for serious, strong players, that is not an advantage, but for the 'big masses' it is. Chess computers used to get exported to 105+ nations world wide. As for the rest of the planet, with exception of Japan and Korea, go doesn't exist. There is no doubt about that some very succesful chess and go players to be very very wealthy. If you're good in that game, you have good brains of course, everyone likes to pay you, most chessplayers even get asked to run a business of some billionair type guys. I don't doubt that's identical in go. Whether 1 go player has more billions worth of wealth than a chessplayer, that's not very interesting. As for the 'subtop', there chess is quadratic bigger than go. How many people live from chess? Well thousands. Amazing yet true. Whereas in a few nations like Netherlands the number of chessplayers that are a member of a federation is getting less, realize the tiny size of the nation here, netherlands has exactly 16.5 million inhabitants. Even then each town still has a chessclub. Chess is total booming in India, China, Turkey and Spain. Just these 3 nations are already nearly 3 billion people. When i was in China, i saw zero go boards anywhere. Vincent On Jan 12, 2009, at 9:22 AM, steve uurtamo wrote: i think you might be estimating this incorrectly. s. On Sat, Jan 10, 2009 at 9:00 AM, Gian-Carlo Pascutto g...@sjeng.org wrote: Ingo Althöfer wrote: What prevents you from freezing in your chess activities for the next few months and hobbying full (free) time on computer go. The amount of chess players compared to the amount of go players. -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: GCP on ICGA Events 2009 in Pamplona
hi, You're miscounting here completely again. Counting the number of federation members is a bad idea. Count the number of people who know a game and regurarly play it. Draughts (internatoinal 10x10 checkers, using polish rules) is really tiny. It is not culture to get a member of a club in a nation like Netherlands anymore, netherlands is a very weird nation in that respect. Yet many learn the game. You shouldn't let yourself get led by federation numbers, but by the number of people who know a game and play it regurarly. Active official competition players is simply a bad measure. The online chess servers are indeed a bad measure. It is spreaded over a number of servers. Yahoo has a daily load of about 7000 that are nonstop playing, chessclub.com which used to be a paid club and still is, it has about a million who visit, yet as a membership has a cost of 50 dollar, which is 500 yuan nearly a year, that's a price i'm not willing to pay either. chessclub offers free membership to titled players. Yet i'm titled and i have to pay how about that? Having 2 dutch titles and a fidemaster title seemingly doesn't count :) The strongest players are at chessclub.com It's about 2500 nonstop (so 24 hours load) now, chessbase is becoming a tad larger now, note also a paid chess club. About 3000 load. Yet again you have to either buy a product of them for 50+ euro for a free 1 year membership, or you have to pay 25 euro a year for membership. There is a range of servers. Each one has about a 500-1000. Most are paid. There chess is really doing bad of course. Experience learns when you setup a free server with a good client that you soon get a huge load. 5000+ is very normal. There is a lot of tiny communities with a 400+ players 24 hours a day online. that's what you get when there is 105+ nations playing chess. Each federation wants their own server. Yet they do not want to pay for it. FIDE really is doing very bad there. Their problem would be they go to commercial companies which have all their own interest, instead of using a few of their guys who already work in organisations and are objective. Vincent On Jan 14, 2009, at 4:06 PM, Mark Boon wrote: On Jan 14, 2009, at 12:43 PM, Thomas Lavergne wrote: Couting xiangqi and shogi players as chess players is a bit unfair... Sorry if I caused confusion, I didn't mean to count those as Chess- players. I just stated that to show that despite large population- numbers in say China, most of those people play Xiang-Qi rather than Wei-Qi. This in contrast to a large country like Russia where I believe Chess is by far the most popular. In Holland however, Chess comes only at third place (or maybe even lower) after Bridge and Draughts. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Rules for remote play at the Computer Olympiad
Hi, I see there has been some discussion in this list about cheating remote. In computerchess this toleration has grown out of hand. Setting the rules clear and sharp there in computer-go might avoid for the future a lot of problems. There is a very simple manner to avoid cheating in go. But let me adress a few points first. 1) neural nets forget about neural nets and cheating. A year or 12+ ago we had a lot of neural net guys in computerchess as well. ANN's are not even close to representing the human mind, as modern insights in how brains work shows clearly already for quite a while. Most important is that the automatic learning techniques of neural nets are so total inefficient that it is really difficult to use them well. Soon anything that is neural net will be beaten by the rest major league. The only case i remember that was a tad more stubborn was basically someone who tried to fool the rest; he bought source code from someone and sold that as the 'neural net optimized version' engine. Yet the original programmer of that code (Joost Buijs), he assured me that this program definitely used his parameters and not some neural net optimized parameters, as he could reproduce even every score of it. So in that case it was not the neural net that was there, it just was getting used as a sales reason. Yet the neural nets will get beaten major league. If not next year, then some years later. You can't forever improve a product without good debugging methods of what it actually is doing. A black box that is real clever and intelligent doesn't exist. 2) sure on paper it is really easy to cheat. IT IS ALSO REALLY EASY TO CHEAT WITHOUT REMOTE MACHINES IN FACT. Oh in computerchess we've seen it all. There is a certain species of persons on the planet, they are not in big numbers there, who are capable of fooling in a professional manner other persons, James Bond is nothing compared to the sneaky manners they get things done. For sure a bunch of them will also try it in computer-go. For these guys, considering how weak for the coming few years go computers will play, there is not a big difference between remote machines and local machines. It's too easy to cheat for them. Communication to and from a program is too difficult to 100% monitor. So to speak just keeping the mouse at a certain spot is already enough to cheat, or having someone a fewmeters away at his laptop signal something over blue tooth or whatever to the playing machine. All been done. The only real simple manner of catching crooks is by having a good tournament director who will enforce in case of suspected moves played, that an engine must reproduce the move it played, with some reasonable decent score. Now some of you will argue loud in one choir: parallel search doesn't reproduce moves. One move can make or break a game, yet those who cheat have the habit to cheat many moves a game and for several games. So there should be many datapoints one complains about. If a program basically cannot reproduce a move, at the discretion of the tournament leader who might want to see whether a move in question has a very similar score to other alternatives (in which case of course you don't know which of the equal scored moves or nearly equal scored moves can get played). But the principle thing is reproduction of great moves. If a program under no circumstance can reproduce a specific move and that for several occasions, then that's very clear proof of course. That is a rule that should be introduced in computerchess also IMHO. Note there is also the time constraint. And search depth constraint. One always must have a logfile displaying which iterations or steps a program already has performed; if one searches in a very selective manner, a rather easy form of cheating that is 'near undetectable', is when a program plays moves that it normally spoken would not have found on that hardware, yet iterations deeper. As in selective search you can assume that in a move sequence m0..mN that the moves 0..N represent the line that one needs to see to find it, there will be of course selectively moves in that sequence that might have been reduced somehow. So if one takes care that in the 'hashtable' such sequence already gets searched deeper, by manually enforcing that sequence, then the program 'learns' itself from hashtable sooner that move. Now in chess this is easier than go currently, as the search method used is reductions, but it'll come in go also. Really effective is giving in the 'mainlines' of your opponent to be searched fully by a number of cores. Yet again the only way to really detect this is by forcing reproduction of the moves by the tournament director. If a system can't reproduce enough of the great moves played, for whatever reason, bad luck. For parallel systems that search total non-deterministic, there is also a simple lemma;
Re: [computer-go] On Don Dailey's first chess program
Heh Don, Paranoia attempts to keep hackers away hacking your software :) On hacking: i found the fritz5 protection the most genius protection ever. You just had to modify 2 variables in an inifile to 'hack' it. All hackers could do this, but the average user had no clue how to edit an inifile, so all 'normal users' had to buy it if they didn't know how to copy CDroms that is. So i bet their sales must've been real good, as there was nothing to 'hack'. Vincent On Nov 22, 2008, at 7:40 PM, Don Dailey wrote: On Sat, 2008-11-22 at 17:54 +0100, Ingo Althöfer wrote: Hello Don, thx for all your answers. I think, I found a website where old programs (from the 19_80s and early 90's) are listed: http://www.septober.de/chess/index.htm# There are also screenshots of RexChess http://www.septober.de/chess/pics/9102.gif and Colossus X (by Martin Bryant) http://www.septober.de/chess/pics/9001.gif I think, in those days Martin's programs were the best-sold chess software in Europe. I heard, he even bought a Ferrari from the money he earned with Colossus. ** Again my wish: When you speak or write about software piracy in Europe (which really exists): Please, do not throw all regions or countries or people in one big pot. I have many friends in Europe who are honestly paying for their games software. And some of them (including me) are sensible when someone writes (openly or between the lines) that software piracy is a standard in Europe. That was not my intent. I think software piracy is a standard in all countries. It's pretty much accepted practice everywhere. I booby trapped one palm program I wrote (not Ogo). Sure enough, within a day or two of me putting it up on palmgear someone I know discovered a site where you could get it for free. Someone thought they had broken the copy protection. The easy way to copy protect a palm program is to associate the palm users name (which you identify your palm with) with a hash key.The secret key you send them much match the hash of their user name. I did that, but it was a decoy. I arranged so that there were routines which checksumed the entire executable for a somewhat more sophisticated test. I did other things to obfuscate things including making the obvious hacks look like they had succeeded and putting the test in several places, but with different looking code. The obvious hack would work for a while, long enough for the hacker to think he had succeeded. In short I wanted the hacker to get frustrated, thinking he had succeeded at first, but eventually wondering if he had really found all the traps. He would never be 100% sure even if he got past the first one, or second one. I didn't do anything malicious, but if the program wasn't hacked correctly it would crash the machine with a message proclaiming that this was a hacked copy. The message of course was encrypted and hopefully difficult to find. When the machine crashed, it would require a reset of the machine. Not just a reboot. I took the hacked version and tested it to see if the hacker had actually succeeded. It looked like he had succeeded, but after getting in and out of the program the required number of times, I got the message! Yeah! I had embarrassed the hacker! It's not possible of course to fool a determined hacker, no matter what you do there is a way around it but my goal was to try to make it not worth the hackers trouble. - Don Best regards, Ingo ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] programming languages
On Oct 9, 2008, at 10:39 PM, Don Dailey wrote: On Thu, 2008-10-09 at 15:20 -0400, [EMAIL PROTECTED] wrote: Computers + random = can of worms. What if I get a fast benchmark by implementing the fastest, most awful, random number generator imaginable? What if every one of my random playouts looks the disturbingly similar? As to the relevance, the time-eaters in my heavy playouts are different from the time-eaters in my light playouts. This is true, but it goes back to the well understood fact that you cannot have a perfect benchmark. I think this would be very useful and very relevant, just not perfect. Random numbers IS an issue. I required transparency so that any issue like this can be observed, reviewed, criticized, etc. However, there is a solution. I don't think this solution is necessary, but it is possible: George Marsaglia published a set of very high quality random number generators that can be implemented with just a couple of lines of code. We could specify that a specific random number generator be used. After all, this is only a benchmark. But these are fast generators. The final step, which I don't personally support because I think it's too restrictive but is also possible is to require these programs to be deterministic.So you could SPECIFY the exact starting seed for a random number generator as well as the specific method (or equivalent) to be used to select moves randomly. I don't really like that idea, but it makes it possible to perfectly verify the implementations. I don't like it simply because it places additional constraints on creativity and flexibility. You might come up with a very clever and fast way to generate uniformly random moves for instance that doesn't work within the framework. Or you may want to benchmark some language that doesn't like 32 bit data types (perhaps 31 bit types for instance.) A lack of flexibility here would unduly punish you. So I'm happy to allow some leeway at the expense of perfect verification as long as we have transparency, anybody who wants can see the code and criticize it or find flaws. (His implementation is FAST, but he cheated a little on the random numbers.) It's also pretty easy to not pick moves uniformly random - I think some on this group may have made that mistake. This should show up if we have good benchmarks. Someone will notice something slightly off in the statistics reported and it will be examined or seen to be flawed. hi, i hate usually replies to my post that zoom in into 1 tiny detail, apologies for doing that. However i'd like to zoom in on the uniformity of RNG's. Of course we all do effort to get uniform RNG's. In fact the net is overloaded with them. Years ago i googled and i stumbled upon ranrot from Agner Fog. http://www.agner.org/random/theory/ Like all fibonacci based RNG's it's completely uniform spreaded. Now THAT has a few problems. When i did do some work a year or 8 ago for a guy called Avery Cardoza, he released a game called Casino2000. Obviously it also has roulette also. Now if we take the European roulette version that has a single zero (US casino's have 2 zero's, a 0 and 00, which makes odds quite bigger for casino), there is on paper 1 strategy that might deliver money. Might, because playing with a system is forbidden of course. On paper there is 1 system that wins. You put 1 fiche, then if you lose that bet you double, if you lose again you double again. So you bet 1,2,4,8,16,32,64,128,256,512,1000 Hah, why 1000? Because usually you won't get further than doubling 10 times. Each table has a max bet. So let's limit the number of bets to 8. 7 doublings in short. Not a single RNG that is considered to be 'good' and 'well'. Yes not even Mersenne twister, is *approaching* reality there that happens in casino's. Now i do not mean the 'bad' casino's, but simply that the profit you make on paper is a lot higher than in reality (other than that they'll kick you out of the casino). The freakin nature of all those bitflippers is such that i'm *assuming* nowadays in some algorithms that they can produce a field for me in O ( n log n ). So not only they are too uniform, they also can produce an entire field, covering every element in O ( n log n), without a single failure. Now THAT'S interesting :) - Don - Dave Hillis -Original Message- From: Don Dailey [EMAIL PROTECTED] To: computer-go computer-go@computer-go.org Sent: Thu, 9 Oct 2008 2:44 pm Subject: Re: [computer-go] programming languages On Thu, 2008-10-09 at 14:17 -0400, [EMAIL PROTECTED] wrote: The http://shootout.alioth.debian.org/ site, ... If we, as a community, could come up with a sufficiently detailed description of light playouts algorithm (see the current Light simulation : Characteristic values thread), there is no reason that this algorithm couldn't join them. Many potential ways to speed up light playouts involve
Re: [computer-go] programming languages
Sure, A lot faster is ranrot in 64 bits, at K8 2.2ghz with old GCC it is about 3.3 ns for each number, so i assume it's quite a tad faster than that for core2. Note it's quite slow at itanium, about 9-10 nanoseconds a number, as it appeared later itanium has no rotate hardware instructions :) Here is how i rewrote it: /* define parameters (R1 and R2 must be smaller than the integer size): */ #define KK 17 #define JJ 10 #define R1 5 #define R2 3 #define BITBOARD (unsigned long long) /* global variables Ranrot */ BITBOARD randbuffer[KK+3] = { /* history buffer filled with some random numbers */ 0x92930cb295f24dab, 0x0d2f2c860b685215,0x4ef7b8f8e76ccae7,0x03519154af3ec239,0x195e36fe715fa d23, 0x86f2729c24a590ad,0x9ff2414a69e4b5ef, 0x631205a6bf456141,0x6de386f196bc1b7b,0x5db2d651a7bdf825, 0x0d2f2c86c1de75b7,0x5f72ed908858a9c9,0xfb2629812da87693,0xf3088fedb657f 9dd,0x00d47d10ffdc8a9f, 0xd9e323088121da71,0x801600328b823ecb, 0x93c300e4885d05f5,0x096d1f3b4e20cd47,0x43d64ed75a9ad5d9 /*0xa05a7755512c0c03,0x960880d9ea857ccd,0x7d9c520a4cc1d30f, 0x73b1eb7d8891a8a1,0x116e3fc3a6b7aadb*/ }; int r_p1, r_p2; /* indexes into history buffer */ / AgF 1999-03-03 * * Random Number generator 'RANROT' type B * * by Agner Fog * * * * This is a lagged-Fibonacci type of random number generator with * * rotation of bits. The algorithm is: * * X[n] = ((X[n-j] rotl r1) + (X[n-k] rotl r2)) modulo 2^b * * * * The last k values of X are stored in a circular buffer named * * randbuffer. * * * * This version works with any integer size: 16, 32, 64 bits etc.* * The integers must be unsigned. The resolution depends on the integer * * size. * * * * Note that the function RanrotAInit must be called before the first* * call to RanrotA or iRanrotA * * * * The theory of the RANROT type of generators is described at * * www.agner.org/random/ ranrot.htm * * * */ FORCEINLINE BITBOARD rotl(BITBOARD x,int r) {return(xr)|(x(64-r));} /* returns a random number of 64 bits unsigned */ FORCEINLINE BITBOARD RanrotA(void) { /* generate next random number */ BITBOARD x = randbuffer[r_p1] = rotl(randbuffer[r_p2],R1) + rotl (randbuffer[r_p1], R2); /* rotate list pointers */ if( --r_p1 0) r_p1 = KK - 1; if( --r_p2 0 ) r_p2 = KK - 1; return x; } /* this function initializes the random number generator. */ void RanrotAInit(void) { int i; /* one can fill the randbuffer here with possible other values here */ randbuffer[0] = 0x92930cb295f24000 | (BITBOARD)ProcessNumber; randbuffer[1] = 0x0d2f2c860b000215 | ((BITBOARD)ProcessNumber12); /* initialize pointers to circular buffer */ r_p1 = 0; r_p2 = JJ; /* randomize */ for( i = 0; i 300; i++ ) (void)RanrotA(); } Please note there is faster RNG's than ranrot, with some shifting here and there. But maybe ranrot is what is sufficient. You won't search enough nodes coming 10 years to get in trouble with it :) Note that the parameter 'processnumber' is the process number of each process (or thread) of search. Safely works up to 4096 cores :) On Oct 10, 2008, at 12:36 AM, Don Dailey wrote: On Thu, 2008-10-09 at 15:20 -0400, [EMAIL PROTECTED] wrote: Computers + random = can of worms. Has anyone seen this: http://home.southernct.edu/~pasqualonia1/ca/report.html#files They are claiming impressive speed and high quality for a random number generator. The code is compact and small, much nicer than mt19937 and the speed seems to blow mt19937 out of the water. I haven't looked at any papers on this and I'm wondering how good it is. Here is quote: The cellular automaton outperforms the GSL random number generators, being more than three times as fast as the GSL generators. The following table shows the mean time for 10 runs of each generator, with each run producing 10 million integers. Source code for both the GSL generators and the cellular
Re: [computer-go] Super-duper computer
That chessbrain is a commercial attempt of a few guys looking for money, not an attempt to really search parallel in a decent manner. This is why they kept their logfiles of course all 'secret'. I'm not the only one who offered my help to them, without payment, but that wasn't accepted at all. In itself weird, they have little experience in parallellizing game tree search at many processors. The software they use is getting used in an embarrassingly parallel manner. So their speedup is real ugly bad of course. A 8 core will easily totally outcalculate chessbrain with respect to search depth. Seems to me the project died, bad shame in itself. It is very complicated to get something to work that can play realtime. Long term analysis projects are more viable and real interesting for several reasons. Yet what you see that most of those projects do is keep the parallel search frame source code secret. Just publish an article claiming victory, without really data to statistically significant back that up. That means in short, that no one can learn from them, nor objectively draw conclusoins. A team that isn't doing supreme effort in getting a good speedup will of course not succeed. Vincent On Sep 28, 2008, at 2:15 PM, Claus Reinke wrote: If you're looking for spare processors, how about a [EMAIL PROTECTED] program for Go?-) It appears that the Chess community has had such a project already: ChessBrain: a Linux-Based Distributed Computing Experiment http://www.linuxjournal.com/article/6929 ChessBrain II - A Hierarchical Infrastructure for Distributed Inhomogeneous Speed-Critical Computation IEEE CIG06, Reno NV, May 2006 (6 pages) (IEEE Symposium on Computational Intelligence and Games) http://chessbrain.net/docs/chessbrainII.pdf old project site: http://chessbrain.net/ From the chessbrainII paper, it seems they considered Go, but before the recent developments that made parallel processing promising. The papers might also be interesting for their discussion of parallel tree search and communication issues. Claus Local versions of the top programs could offer to connect to their main incarnation's games, explaining internal state (it is sure it will win, it thinks that group is dead, ..) in exchange for borrowing processing resources. Or, instead of doing this on a per-program basis, there could be a standard protocol for donating processing power from machines whose users view a game online. That way, the more kibitzes a game attracts, the better the computer player plays; and if the game cannot hold an audience, the computer player might start to seem distracted, losing all those borrowed processors;-) Mogo might even find some related research at INRIA ([EMAIL PROTECTED] style (desktop) grid computing, ..), so perhaps there's scope for collaboration there? Claus Q: why do you search for extra-terrestrial intelligence? A: we've exhausted the local search space. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Analysis of 6x6 Go
You guess also in go: side who begins wins game? Vincent On Sep 22, 2008, at 9:08 PM, Erik van der Werf wrote: On Mon, Sep 22, 2008 at 7:14 PM, Ingo Althöfer 3-Hirn- [EMAIL PROTECTED] wrote: Does someone here know of other (documented) attempts to solve 6x6 Go? Didn't Erik van der Werf do it under his rules? He did it for 5x5-Go, see at http://erikvanderwerf.tengen.nl/5x5/5x5solved.html Several 6x6 positions were solved, but not the empty board. E.g., for the following position we could prove a Black win by at least 2 points (which took about 13 days in 2002). . . . . . . . . . # O . . . # O . . . . # O . . . . . # O . . . . . . . Optimal play on 6x6 under Chinese rules is expected to give a Black win by 4 points. Erik ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Bobby Fischer
Don, When i play or analyze with a world top player, like top 10 of world, and i do not get that chance a lot in my life, then i can really assure you, from a social viewpoint seen, maybe YOU are far better than any of the Russians i've ever played. But from technical chess viewpoint seen: Opening, Middlegame, Endgame and strategically and from accuracy viewpoint, they're a different type of level. The average nerd here will not be able to imagine this. They're *so* much better. A few months ago i had the honour of playing someone from India from the highest caste; I concluded that he was better in every respect than me: Strategical, Tactical, Opening, Memory (he remembered way way more), not to mention endgame. Todays world top players, regardless whether you speak of a 9 dan go, or the highest rated chessplayers, they are a league on their own, where as a human race we can be only proud of having them. Now it is because the games have a limited size of board and limited set of rules that define the play, that with software we can or will beat them. Yet you must realize very well that a strong chessplayer is also a strong go player and that a strong go player is a strong chessplayer. Note this is NOT the case for checkers. Go and chess are both SYMMETRIC games. Strategy and tactics are similar. The indirect manner to win at chess and go is the same. In go it is also the endgoal: maximizing your influence levels, in chess that's similar to maximizing mobility. In 10x10 international checkers, the toughest and most widely played variation of the game, not to confuse with the simplistic and nearly dead analyzed 8x8 checkers, those checkers games are asymmetric type of games. You cannot make a move to any part of the board, unlike with chess/go, you can only move forward and never go back. That asymmetric nature of the game really is of overwhelming importance. It totally changes the nature of the game. Typically world champion level 10x10 checkers players are very poor chess players, 1600-1800 rated most are. That's not even enough to play in national competition here. My club has 4 teams (which is a lot) in national competition. Not a single of the titled checkers players are strong enough to play there. On the other hand if a professional go player would knock on my door right now: heh i'm 5 dan professional go, i like to play some chess this season, i would directly call the federation: heh we have a new player for our second team. Within 1 day after learning the basic rules he/she can already play easily in first division. If you realize that this is just 1 grade underneath the professional division (masterclass) and that there is 2 more grades (second class and third class) where we also have 2 teams, you'll realize the huge difference. Firstclass is 2100 - 2350+ , a few GM's (Jan Timman) and a few IM's play there as well. If you want to know how strong a pro is, play him for money. I know the double sides effect of the above statement, in chess the big problem with those chess programs/computers has always been that people got paid to lose. This because most sponsors of matches always had more interest in winning a match than a fair competition, or they set up the matches wrong. It is cheaper to hire a professional player for a lumpsum fee, as he doesn't need to achieve a thing then and he can relax and do his best next day when playing in an all GM tournament. The few matches we had where GM's got paid a lot more when winning, the computer has never ever in history won those matches. Kramnik demanded 1 million dollar from a sheikh 'advance payment', Hydra team paid a lumpsum, Kasparov played such beginners level against deep blue in the second match, just like he did do in the first match (which was enough to win) about elo 2100 he played, that even today i'm not sure why. There is 2 different statements that reached me. Several Russian GM's (some living in the west) swore to me that he has betted millions at the computer (could get good odds), some other official who was there also during the match at the side of the Kasparov team (whose name i won't reveal, as i forgot who it was), told me clearly there was a 'contractual' issue. Personal i tend to believe the last explanation more than the first, deep blue was so outdated by 1997 from algorithmic and evaluation viewpoint, that under no condition could they have played another match without looking like total beginners compared to the PC-software programs. They searched 10-12 ply with the thing, as logfiles also prove clearly. 2 years later, in the open hardware world championships 1999, the quad xeons (400-500Mhz) searched 14-17 ply handsdown and a lot more than that in endgame. I reached myself at a quad xeon 20+ ply there in endgames where deep blue got 12. Not a single of all those matches ever has been objective. Human players have 1
Re: [computer-go] Lockless hash table and other parallel search ideas
On Sep 10, 2008, at 2:12 PM, Don Dailey wrote: The rules are exactly the same for 9x9 as for 19x19. The boardsize is different and that changes the game some. I would suggest that if a top go player plays a game of chess immediately after first learning the rules, he would lose very badly to even mediocre players or even advanced beginners. Usually i hesitate commenting just upon 1 statement out of an entire story. Additionally you have the advantage of the native English skill; kind of Obama type statement that you can still define an 'advanced beginner' as being a titled player who doesn't make money with it. But we must correct you here in case you no longer see yourself as a beginner or as an advanced beginner. Directly after learning the games of chess, a strong go player will be able to win from you. Strategically and tactically they're that much above your level that they will completely annihilate you. Regarding that there are big differences between 9x9 and 19x19 i agree. 9x9 is a very simple game compared to 19x19. From computer algorithms viewpoint seen that hard forward pruning is tougher to do there; explaining why the current random searching methods do so bad in 9x9 and a lot better at 19x19, relatively spoken. In absolute sense the will fail at both games of course. The only proof the current random searching methods give in go is that searching deeply in random manner is better than searching short lines in super-dubious manner. IMHO this is logical. The fact that there is zero fulltime salary prospects to get clever guys interested in putting in years of fulltime effort into 9x9, is the reason why the programs play this still so weak, as just a single one of them would raise the game quality a lot. It is of course a combination of evaluation and search. What you typically see is that players with little domain knowldge in chess nor go, are busy with it now, doing a brute force attempt. Note that computer-go has one big advantage over computer-chess; because there is little sales possible to achieve, there is little money at stake, that gives the advantage that the programmers at least communicate with each other in a forum like this and at tournaments. In computerchess it is very difficult to find talkative persons. The main progress that happens in computerchess is simply by debugging other persons code. It goes that far that a Russian author has even simply automatically converted the program Rybka 1.0 back from assembler into C code, it is called strelka. This communicative skills problem is why progress in computerchess goes relative slow. Still many brilliant guys get lured into it, because chess gets played in 105+ nations. You see clearly now that they do not do much effort for it nowadays, as just like computer-go there is nearly no money to make with an engine (in contradiction to GUI). In doing that, it is selfexplaining that those who have a parallel go- program, still didn't figure out what computerchess already knew in the 80s, namely that to run parallel you need to avoid central locking, and need to search with a global hashtable (Feldmann, somewhere in the 80s) and a decentralized form of doing the search. That sure involves locking in case you want a good speedup, something Leiersonco with CILK did not manage nor Feldmann, but that is all very well solvable with a big effort. Yet those big guys who were busy with chess in the past years, who already knew back in the 80s a lot about decentralizing the parallel search, in computer-go they seem absent. Vincent I really doubt this would be the case with 9x9 go. I don't think you can really make a strong argument that 9x9 isn't go or that it's not the same game.You CAN argue that the characteristics of the game are different and different aspects of the game are emphasized. Some really strong players may not be specialists in 9x9 and may lose to players who specialize in 9x9 but are otherwise a few stones weaker at 19x19, but that's not remarkable. In chess you can also be weakened significantly and be thrown off your game by a surprise opening - or we could imagine a game where your opening is decided for you and it would make you very uncomfortable. My guess (and it's only a guess) is that strong players playing on the 9x9 board are simply very uncomfortable but probably do not play as weak as they imagine.In chess I heard that someone once did a study to find out if playing speed chess weakened the performance of some players more than others and despite the fact that many players imagine that it does, it turned out that there was a remarkable correlation, although no doubt some players who specialize at different time controls would have an edge. - Don On Wed, 2008-09-10 at 11:27 +0900, Hideki Kato wrote: Christoph Birk: Pine.LNX. [EMAIL PROTECTED]: On Tue, 9 Sep 2008, Olivier Teytaud
Re: [computer-go] Super-duper computer
Yeah won't be long until we've got petaflop hardware in PC hardware. Currently it is double precision floating point crunching power in CELL type hardware. In short using hashtables is going to be very complicated on that type of hardware. If you're not using that your efficiency is not so high (understatement). You sure you can run on CELL cpu's? Maybe it is time thinking of pondering on a more efficient form of search, instead of add up on inefficiency? Vincent On Sep 7, 2008, at 4:40 AM, Darren Cook wrote: This beast goes online in 2011. Better start lobbying now for some Mogo time. http://www.networkworld.com/community/node/32152 By coincidence I was looking at the Top 500 list yesterday and the top machine already does petaflop (peak) performance [1]. I wonder how many playouts/second Mogo would do on that :-). Darren [1]: http://www.top500.org/blog/2008/06/14/ preview_31st_top500_list_world_s_most_powerful_supercomputers_topped_w orld_s_first_petaflop_s_system -- Darren Cook, Software Researcher/Developer http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic open source dictionary/semantic network) http://dcook.org/work/ (About me and my work) http://darrendev.blogspot.com/ (blog on php, flash, i18n, linux, ...) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] mogo beats pro!
Congrats to the MoGo team for getting system time at SARA for a match. The architecture of the power5/power6 system (2007 july a power5 system was installed and that has been updated to power6 now), is based upon having sufficient RAM and high bandwidth to i/o (for each Gflop a specific amount of i/o bandwidth is there). When they opened the machine (when i stand in front of it at SARA), you can see that a power6 is basically 1 big block of metal that eats a shitlot of power. A single rack with a few powerblocks eats more power than the entire storage of it (which is a huge wall of harddrives). Huijgens has 8 dual core processors, 16 cores a node in short, which are very bad for integers. For most go/chess/checkers type applications a 4.7Ghz power6 single core performs like a 2.4Ghz core2. That is because it focuses upon a high number of instructions a cycle, basically floating point (double precision). The power processor is not so great for most programs for integers. Its IPC (instructions per cycle) is far under 1.0 for most programs. For such type of hardware i therefore did do a PGO run usually with Diep that lasts for 24 hours. After that the compiler software has more datapoints to optimize the software better. That gives a huge speedboost. Each power6 block is connected with a highspeed network to the other nodes. There is 2 physical network lines, i guess one for i/o and one for communication (RAM). So the latency + bandwidth from 1 node to another is real bad compared to the huge crunching power of 1 node, from RAM communication viewpoint seen. A blocked get will be between 5 and 10 microseconds. That's not so fast. From calculation viewpoint seen (instructions a cycle executed), the power6 is not so fast when you need more crunching power than 1 block. So for example for my chessprogram having 1 node would be interesting as that's a lot faster than a PC. I wouldn't ask for more than 1 node though; hashtable is too important for a program like Diep. Even though its nps is low compared to other chessprograms (it has the biggest evaluation function of all chessprograms) it still is doing a 200k nps a core * 16 cores = 3.2 million nps. That's 3.2 million reads and 3.2 million writes to/from a shared global hashtable. Add to that that the network itself then also will deliver 6.4 million transactions a second from the other nodes. So that's a grand total of 12.8 million. Suppose we say this is allowed to eat 10% of our system time, then we would require a latency equal to handling fulltime 12.8 * 10 = 128 million transactions a second or under 10 nanoseconds. In reality however the network is factor 1000 slower in latency, demonstrating the latency problem to its full extend when one uses more than 1 node for game tree search. When using many processors (hundreds) using a global shared hashtable is extra important, as it prematurely terminates by means of a hashtable cutoff millions of searches that are not needed to get done. The necessity of a global shared hashtable was already noticed clearly by Rainer Feldmann long time ago. When not using a shared hashtable last 10 plies at 460 processors, the search depth loss is about 6 to 7 ply at tournament level time controls for Diep. The power6 is a magnicifent machine for software that can use big RAM. Power6 as in SARA has hundreds of gigabytes of RAM. If your software can work on multiple of those nodes very well, one should consider buying for a $10k a big Tesla setup. It has some thousands of cores clocked to 1.x Ghz. You can get one to develop at already for just 520 euro. The number of instructions a second Tesla can execute is a lot higher than power6 can do for the sequential game tree searching codes. Tesla also gets handsdown an IPC of 1.0, higher than Power6. Let's use the IPC = 1.4 at core2 2.4Ghz, that's giving power6 roughly ipc=0.75 Those 800 processors executing : 800 * 0.75 * 4.7Ghz = 2.8T instructions per second. that's quite a lot compared to a simple PC, which of course is a zillion times more efficient thanks to global shared hashtables. A single new card from Nvidia when it is in tesla form is: 240 cores * 1.2 Ghz = 0.36T instructions a cycle. The tesla setups combine a bunch of those cards, giving a similar amount of instructions a cycle. Assuming of course one isn't using double precision floating point for the software. a new nvidia GPU has 60 processors onboard that can do double precision, not 240. I'd estimate the search efficiency of mogo at the same efficiency level like Deep blue. Deep Blue wasn't 11 teraflop. Not even close. Its hardware was of course much faster than that from instructions per cycle seen. It was having an average of 130 million nodes per second. Now for simplistico programs nowadays, A single node eats roughly 1000 cycles at core2. IPC = 1.4 1000 cycles a node 130 mln nps.
Re: [computer-go] Strength of Monte-Carlo w/ UCT...
On Aug 9, 2008, at 9:45 PM, Don Dailey wrote: I'm curious what you guys think about the scalability of monte carlo with UCT. The MCTS technique appears to be extremely scalable. The theoretical papers about it claim that it scales up to perfect play in theory. We agree here that this is not true of course. You can prove that random forms of search basically search based upon mobility. The insight for that is real simple. If you control more area you have more possibilities, so the statistical odds of that backtracking to the root is bigger than positions where you control little area. So a very selective searcher equipped with a good mobility function will beat it, as its worst case is better. As you notice correctly at a certain time when programs get real strong at certain areas, then the randomness of a search tree will backfire as it creates a worst case in program vs program. There is however 2 problems to solve to make it happen, maybe a 3d: a) making a good mobility evaluation function. Not easy in chess nor go. i must admit in chess it took me until somewhere in 2004-2005, so more than 11 years after starting chess, to make one. This as a good chessplayer who has put in a lot of time there. In go where literature is even more inaccessable to programmers it will be harder even. b) selective searching is not so easy. How many is there who can search ultra selective and still improve bigtime in playstrength when given more cpu time? Well, other than Leela that is... Historic note: we had randomness at chess too, if you just use mobility, nothing else, not even material values, you soon end up above 2000 elo. Go programs are far from that yet. So until then you still will have to deal with the busloads of guys who claim neural networks work very well as a replacement of search and/or evaluation. It took until 1998 or so to find great well tuned parameters for material in chess (done by Chrilly Donninger, note he posted publicly that the tuning didn't matter, just the patterns themselves; an interesting statement at the time), Now the question is, what will happen when someone finds those for the most dominating positional components in computer-go? Note Don that the obvious thing in go that a lot of humans who suck in go miss, is that finding the best moves in a position as a subset of all total moves is a LOT easier than in chess. So selective search in go is far easier to do well than in chess. It's relative easy to select each time just a move or 30 to 40 that are real relevant. If you look at it like that, then you can achieve easily far deeper selective search depths that are relevant in go, than in chess, given the same hardware. In chess you simply cannot afford to not look at the most stupid line possible, simply because sometimes sacraficing the entire board works. The goal is just conquering the enemies king. In go however if with some certain degree of sureness you have a won position, there is NEAR ZERO need to look further. Hard pruning works far superior there. Effectively go topprograms therefore get similar selective search depths to the search depths of chessprograms at the start of the game, given the same hardware and time controls. This despite that chessprograms get far higher NPS-es. When todays top chessprograms run on hardware from 1999, which was quad xeon 500Mhz that showed up in world champs, at 3 minutes a move, they totally destroy with 100% scores the field back then. Not a single draw. So we can argue a lot about search here. Considering the selective search depths todays go programs get i would argue that the most important improvement now is evaluation function. That doesn't mean it needs to get implemented in the same manner, nor that it needs to be some sort of huge function. Considering the go strengths of the top authors in computer-go, one would argue they better can implement simple knowledge and tune it very well. That'll kick more butt than a small improvement in search. People who holy believe in search depths as the absolute truth are interesting people. If i play my todays Diep at hardware from 1999 against Fritz from those days, one of the stronger programs at the time, Fritz would get 17 ply every move. Versus my Diep maybe 12. Hell i'll even give 7 ply odds when needed. Just let me turn on a few extensions to not be tactical TOO weak. It won't help Fritz from those days. It loses bigtime. On paper 7 ply search depth extra is worth an elo or 700, so todays Diep should make 0% chance, just on paper. In chess the center is dominant. Real soon everyone figured out that pieces and pawns in the center are worth more than when not in the center. In Go all this is more complicated than chess. In todays top programs when they do not know what to do, they just put their (light) pieces and pawns in the
Re: [computer-go] 2008 World 9x9 Computer Go Championship in Taiwan
From 1997 and onwards i managed to join in the computer chess world champs every year. Besides the participants Stefan MK (Shredder), Shay Bushinsky and Amir Ban (both junior) and tournament director Jaap van den Herik and Joke Hellemons who is doing the entire organisation from ICGA side; besides that 'hard core' addicted computerchess guys and woman, all of them excellent company also in restaurants, the other few who show up are there at most just for a year or 3. If we forgive sometimes that people cannot show up 1 or 2 events, then we can also add Gerd Isenberg (who might be missing in Beijing) and David Levy. Especially David is even better company in a restaurant, sometimes he pays the bill. Note that in ICGA events it is very easy to find the board (Jaap, David and Joke). In Paderborn 1999 someone needed them, so we adviced the guy to look on the internet and search for the most expensive hotel and most expensive restaurant; indeed in hotel Arosa the ICGA board had dinner. I do not blame authors to not show up at events. Showing up is expensive, and most authors have many talents and get asked for many different type of events. Most events you have to pay for. Most of them only show up when their engine is 'in big shape'. You can be sure that the strongest engines also join an event. An author knows pretty well when his engine makes a chance of doing well. The same will be true for computer-go. The programmers who know they make a chance of winning, they all will be there for sure. From the above elite group of ICGA members/participants that shows up a lot at the computerchess, at least 1 is not bribable. Maybe that's why i am looking for a job now. Even having no sureness of income as of now, it would not occur to me, not even at gunpoint, to recognize some other event as the official world championship, when it is clear that the entire organisation was officially planned and scheduled for China. It is not a problem when others organize events, in contradiction, when others also organize events that's even better. It is nice to have many events, that makes a science more popular. Yet do not forget that the real and official event is in Beijing. That aside, every year, when the location is far away, it is tougher to participate; the struggle to find a sponsor paying my bills is very tough. Sometimes there is years i succeed in that, other years i pay it myself. The ticket costs i have are far larger than the amount of money ICGA pays back. Add a hotel and extra travel time (as the airplane to it needs like 10 hours or so, so you lose quickly 2 days). Additionally being born in a nation where it hardly ever gets hot, i cannot stand heat very well. I figured that out in Tel Aviv 2004. In computerchess there is another cost, that most of you ignore. Hardware. Easy if you come from Japan, USA, Germany, UK or France. I come from a tiny nation. There is no highclocked 8 core machines here that i can get from sponsors to test and play at in this tiny nation of 16.5 million inhabitants. Now that computer-go gets slowly more mature in this sense that engines start to beat slowly more human beings, and not soon from now will beat 99% of all go-players, after which it will take endless until you can conclude objectively that they are stronger. Of course for the average Joe on the street, he will conclude 10 years before it happens that the machine is stronger, as the go world will figure out within a year or 10 from now that the human side has a major weakness that will cause it to lose 10 years sooner than it should: a human being is in big need of money and therefore lose for money. This is partly because of arrogance. They do not prepare and play silly against the machine, usually several 10 kyu moves get made during the game, to still let the game look interesting and especially to give the sponsor an interesting result. This instead of beating it bigtime. The stronger the player, the more dumber social. It will be one of those players who will lose. If the best player cannot get bribed, like Topalov refused a match under the conditions offered, they'll skip and wait until a next world champion is there who is bribable (why do those always have a Russian passport?). Even winning 10 matches after by human side, or losing games to tiny Grandmasters, that doesn't matter. The first match a world top player loses will get all attention. The damage will be done and the average Ye on the street will consider the machine stronger. Ye is always right; this is because Ye follows the first marketing principle. The first marketing principle says: It is better to be the first than the best (Al Ries Jack Trout) For a titled player like me, comparable to about 1-4 dan professional (i'm 4 dan only and only when a game is real important to my team, in all others i'm at most 1), i needed a