Mark Boon wrote: > Thanks for the pointer Don, that might be worth looking into at some > point. > > When you say 'amazing accuracy despite this speed and simplicity' I > still have to ask: "how fast?". I think 10usec is actually pretty > fast. But when talking about using it during MC playouts for example, > it's still rather slow. This is due to the pile-up of numbers. A > (relatively) big area to look for, and a large number of occasions to > do the lookup. > > The patterns don't necessarily need to be hand-crafted. But throwing a > million generated patterns at it also has its price in terms of > performance. Pattern lookup has nothing to do with the number of patterns it was trained on of course. Think of this as a kind of neural net. The data produced is compact, and is a function of the number of attributes, and the number of classes (this is a classifier) and not the amount of patterns used for training.
Therefore 10 million training patterns would take the same storage as 1 million or even just one. I generate 16 versions of each pattern for each possible color and orientation and the training time, I'm guessing, is dominated by the time it takes to read the training file (a text file) into memory, parse the fields, and rotate the patterns. Of course I don't really care about the training time itself since this is a one time operation but it's fast nevertheless, not like training a neural net or running a genetic algorithm. All it is, is parsing data and collecting it into a data structure to be able to calculate probabilities from. If you are just interested in patterns generated the old fashion way, hand crafted without learning or generalization (other than wild cards), then this has nothing to offer for you. The techniques for doing this have been around for decades and you are an expert in them and there is nothing new in this. David Fotland, and many of the top GO people have very fast pattern matchers that work well and are super-optimized to do what they do very quickly. I definitely agree with you, when it comes to pure state of the art pattern matching (the old fashioned way) that variable pattern shapes with wild-card matching is very important. This is all fine for a programmer who is also an expert GO player, but for someone like myself there is no other choice (other than taking a few years to become a GO expert) but to explore this in a machine learning context. Naive Bayes is called "naive" for a reason. It makes assumptions about the independence of observations that are not true, however it has been shown both empirically and formally that it can very often perform surprisingly well despite these limitations even in domains where attributes are highly correlated. In GO, patterns attributes of course are highly correlated probably to an absurd degree, but there are many simple enhancements that correct the shortcomings while maintaining most of the simplicity. But it is a whole different thing from what you are doing. - Don > > Mark > > > On 26-mrt-08, at 16:17, Don Dailey wrote: > >> Mark, >> >> I am doing some experimentation with something similar to patterns, but >> based on Naive Bayes classifiers. The idea is to use Naive Bayes >> classifiers to generalize patterns. The classifiers would still be >> focused on some constrained area of the board, such as the 5x5 matrix or >> something, but they would be extremely compact compared to patterns and >> very good at generalizing. I'm convinced they would have to be enhanced >> with additional attributes concerning the points being considered, but >> one of their strengths is that you can pile on huge numbers of >> attributes without killing the speed or memory consumption very >> significantly. >> >> Recently there has been a huge surge of interest in Naive Bayes >> Classifiers due to their simplicity and speed, and yet amazing accuracy >> despite this speed and simplicity. Nothing has been found that >> improves all that much on Naive Bayes for many applications, and some >> simple improvements to Naive Bayes has put it in the same league as >> other far more complex and computationally expensive methods such as >> neural networks and decision trees. >> >> I have no idea whether I'm barking up the wrong tree - but I think it's >> definitely worth taking a look at. I don't think GO programs can be >> improved much more by simply adding more hand crafted patterns - we need >> to find ways to generalize knowledge in powerful ways. >> >> Naive Bayes is trained by example and it's trivial, basically a single >> pass statistics gathering. So you must basically show it a gazillion >> sample patterns with "known" classifications. You could build these >> from games of strong players for instance. >> >> - Don >> >> >> >> >> Mark Boon wrote: >>> Lately I have been putting some effort into pattern-matching. Although >>> I have made progress, the result was not as good as what I had hoped >>> for by about an order of magnitude. This makes me wonder what is >>> currently actually the state of the art of pattern matching in Go. >>> >>> Of course it's a bit of a vague question, as there are many possible >>> forms of pattern-matching. Fixed-size patterns are the easiest since a >>> hash-code can be used. Nothing is going to beat that in terms of >>> speed, but its use is limited to some special occasions. Joseki is one >>> and apparently 3x3 patterns are used in Monte-Carlo programs. >>> >>> But I think the most generally useful form is one that can do >>> pattern-matching for variable shapes. (Or which can have don't-care >>> points if you like.) I thought I had a solution that would be pretty >>> close to the theoretical best performance. Formally proving that would >>> probably be a dissertation in itself, most important for me is in >>> itself it works and with modest memory requirements. That is the good >>> part. The bad part is, if I'm right this is the best it can get I'm a >>> bit disappointed it isn't faster. I'd rather be proven wrong here. >>> It's written in Java so making it in C would possibly give a two-fold >>> speedup, but that's all I can think of. >>> >>> What I have now takes 10-15 microseconds on a 2Ghz Intel computer to >>> find all the patterns on a board (on average for 9x9, for 19x19 it's >>> more like 15-20 usec) and it also gives me the 'new' patterns i.e. >>> patterns that match now but didn't match the previous move (I believe >>> that's a useful feature, but it's a detail). The set of patterns is >>> under a thousand patterns. Somehow smaller sets don't go much faster, >>> but larger sets do slow down, every ten-fold increase in number of >>> patterns seems to double the time. >>> >>> So I was wondering if others cared to share the performance of their >>> pattern-matcher. I just want to find out if I'm chasing a unicorn or >>> not by trying to make something faster. >>> >>> Mark >>> >>> >>> _______________________________________________ >>> computer-go mailing list >>> computer-go@computer-go.org >>> http://www.computer-go.org/mailman/listinfo/computer-go/ >>> >> _______________________________________________ >> computer-go mailing list >> computer-go@computer-go.org >> http://www.computer-go.org/mailman/listinfo/computer-go/ > > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/