I want to build a 7x7 omniscience database so I can "solve" 7x7 GO. I did some calculations and come up with the following:
There are 239299329230617529590083 possible board states which is calculated as 3^49. You can eliminate a large fraction of them by considering a canonical representation of each state, where most states have several reflections and rotations and color reversal equivalents. But that still leaves a little more than 14956208076913595599380 different canonical states if you assume 1 canonical position per 16 equivalent states. The actual number is less than 16. When I was working for MIT I built a Rubik's cube solver using an interesting technique - perhaps it can be applied here? I made a perfect minimal distance solver which means it would give you the minimal length solution to any supplied cube. I'm not sure how to describe the technique I used, but it had some of the characteristics of a "bloom filter." I essentially started from a solved cube and searched backwards to build a solution table. But it's not what you think - it's not an actual database of solved positions and their depths. Instead, for each position encountered, I would hash each cube face to a distinct address in a table of 4 bit entries. A cube "face" was the state of a 1/3 slice of the Rubik's cube (I am not sure about this detail, it may have just been the color state of a 3x3 face looked at head on but that's not important for this discussion.) The entry contained the minimal solution depth for any cube which had a face that hashed to this location. So if an entry had the number 5 in it, it was interpreted as: "This cube requires at least 5 moves to solve." The table was initialized with the depth of the initial search used to build the table. Whatever number was in the table was a bounded guarantee and there was no such thing as a hash table collision since anything that hashed to a given location shared the entry. The table was large enough that not too many positions hashed to the same entry which is good, because the value of the entry was the smallest value that took that slot. The search was done iteratively, starting with depth 1, depth 2, etc. At each node you would look up each face in the table to determine if it was pointless to continue. For instance if you were on the 5th ply of a 10 ply search, you had 5 ply to go. If the table indicated that that any face required more than 5 ply to solve, then the whole cube would also require at least 5 ply to solve. You could immediately halt searching that node and save an enormous amount of effort. One of the students contributed to the project by building a 2x2 cube database which can be seen as the corners of a 3x3 cube. Applying similar reasoning, you could deduce that if the 2x2 outer cube of a given cube required N moves to solve, then it represented a lower bound on any 3x3 cube that contained it as a subset. These techniques combined with some logic to control redundant sequences enabled us to solve any cube in 2 hours or less. This was many years ago and computers were slower and had less memory. One of the students solved hundreds of cubes hoping to find one that required more than 24 moves (I think it was 24.) As far as I know no one has yet proved how many moves the most difficult cube requires to solve, but it is believed to be 24 (using the quarter turn metric - where a move is one outer face twisted 1/4 of a turn - no half turns allowed.) We never figured out how to prove this and we could never find a counter case. Of course the 7x7 game of GO is hopeless with the numbers I showed above. Even if a superb memory efficient scheme could be hatched, or we had infinite memory to play with, we have a time problem. The number of positions is so enormous that we would not have TIME to fill a table to the point where it started to be minimally useful. But it's still fun to think about - and there may be techniques based on similar ideas (of the cube solver) to make the 5x5 game solvable with a small database. The 5x5 database requires about 847 Gig of memory. With board rotations and reflects and such it is more like 52 GIG and you can probably squeeze more than 1 entry per byte, bringing it down a bit more. If you want a database that just says win or loss based on some fixed komi, you can represent each position in 1 bit. But that's no fun and not nearly as useful. Of course 5x5 has already been solved by computer and I believe a database has also been built. But it might be interesting to find a solution based on much smaller memory requirements - like I did with Rubik's cube. One possibility is the creative use of a bloom filter. A bloom filter is like a super compact hash table that cannot prove membership, but can disprove it. If it claims membership, then you trust it to any specified degree by how you construct it. How might you use a bloom filter? Like this: If it were possible to build an "evaluation function" that gives the correct answer a very high percentage of the time, a bloom filter (or hash table) might be consulted to identify those positions that are "wrong." You would use a hash table if you wanted the correct answer with no extra computation, but you would use a bloom filter if you were simply interested in KNOWING that the answer is "probably" wrong. In the bloom filter case, you simply continue to search - never stopping on a position that was identified by the bloom filter as belonging to the set of positions that your evaluation function cannot handle correctly. This would give you a solver that used much less memory but was fully admissible as a solver. In other words it would always return a best move to any position provided to it. It's likely that you would have a difficult time constructing such an accurate evaluation function even for 5x5. To be effective, it would have to identify the game theoretic value correctly a significant percentage of the time. On a small board such as 5x5, there already exists programs that can "almost" solve the entire game using pure search and some admissible evaluation smarts. So it seems that all we need is any boost that significantly reduces the amount of searching, so we may only need modest improvement in this regard. For instance if we could just solve 50% of all positions from a relatively simple evaluation function - it would be a big win. A bloom filter would need at least twice the number of bits as the number of position we can't solve - still not a huge savings over a full table but at least a starting point, still a significant savings but with a large searching component still required. It begins to get more interesting if we can build an evaluation function that gets a HEAVY percentage of the positions correct. We will need this if we hope to get by with modest memory usage such as available in most home computers today. I suspect an evaluation function like we are talking about can be made very accurate after a few moves have been applied, but not very accurate in the first few moves. This is good enough, it just means our solver will search a few moves before finding positions it can stop on. Since the evaluation function is applied to EVERY move, many moves will hit on the correct score by virtue of the educated guess it makes and a little dumb luck. So we would still encounter many positions early in the tree that we could stop on, saving a great deal of work. But I think we can do better. It's one thing to make an evaluation function which returns the correct answer, be we can also try to take better advantage of the estimated score that IS returned. A significantly smaller filter could be designed if we impose less constraints on the accuracy required. For instance, if we allowed the evaluation to be off by 2 points in either direction before considering it "incorrect", we would have a much smaller bloom filter and many more evaluations sanctioned as usable. However, the alpha/beta search would have to consider this during the search procedure. Having an upper and lower bound on the value of a position will lead to rich cutoffs in an alpha/beta mini-max search. This would reduce the branching factor enormously and perhaps make it possible to search any position in a reasonable amount of time and memory. If the evaluation function could be designed to also return a confidence factor, we get another win in efficiency and the size of the bloom filter. For instance if the evaluation function ADMITS that it doesn't understand the position, there is no need to store this position in the filter (even if there is a small chance it is correct.) Depending on how the bloom filter is constructed, we may save 2 or 3 bits for every position that we KNOW isn't likely to be correct. Of course we now have to search these positions but we are betting on the fact that we probably would have to search them anyway. The feasibility of this approach consists almost entirely with how accurate we can make a "best guess" evaluation function. Of course the method is perfectly workable no matter how bad the evaluation function is, but that doesn't mean it's a win terms of resources. The goal is to make a solver that is significantly faster than the existing brute force solvers and uses significantly less memory than a 5x5 omniscient database using a hybrid combination of both techniques. Of course there are problems determining how to actually CONSTRUCT the database initially while keeping the memory requirement low, but I think it is doable. Just some stuff to think about. I'm tempted to go for the 5x5 database using these or similar ideas. Naturally, the 7x7 is not feasible using any of these ideas but at least I got you read this far .... - Don _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
