Hmmm, screenshot doesn't seem to be getting through to the list, so here's a textual graphic instead.
A B C D E F G H J K L M N O P Q R S T +---------------------------------------+ 19 | . . . . . . . . . . . . . . . . . . . | 18 | . . . . . . . . . . . . . . . . . . . | 17 | . . . . . . . . . . . . . X . . . . . | 16 | . . . . X . . . . . . . . . . O . . . | 15 | . . . . . . . . . . . . . X . . . . . | 14 | . . . . . . . . . . . . . . . . O . . | 13 | . . . . . . . . . . . . . . . X O . . | 12 | . . . . . . . . . . . . . . . X O . . | 11 | . . . . . . . . . . . . . . . a . . . | 10 | . . . . . . . . . . . . . . . . . . . | 9 | . . . . . . . . . . . . . . . . . . . | 8 | . . . . . . . . . . . . . . . . . . . | 7 | . . . . . . . . . . . . . . . . . . . | 6 | . . . . . . O . . . . . . . . . . . . | 5 | . . . . . O X b . . . . . . . . . . . | 4 | . . . X X X O O . . . . . . . . . . . | 3 | . . . . X O . . . . . . . . . O . . . | 2 | . . . . . . . . . . . . . . . . . . . | 1 | . . . . . . . . . . . . . . . . . . . | +---------------------------------------+ Black (X) to move. The screenshot showed that Leela's policy net put about 96% probability on a and only 1.03% on b. And that even after nearly 1 million simulations had basically not searched b at all. On Mon, Apr 17, 2017 at 9:04 AM, David Wu <lightvec...@gmail.com> wrote: > I highly doubt that learning refutations to pertinent bad shape patterns > is much more exponential or combinatorially huge than learning good shape > already is, if you could somehow be appropriately selective about it. There > are enough commonalities that it should "compress" very well into what the > neural net learns. > > To draw an analogy with humans, for example humans seem to have absolutely > no issue learning refutations to lots of bad shapes right alongside good > shapes, and don't seem to find it orders of magnitude harder to apply their > pattern recognition abilities to that task. Indeed people learn bad shape > refutations all the time from performing "training updates" based on their > own reading about what works or not every time they read things in a game, > (almost certainly a thing the human brain "does" to maximize the use of > data), as well as the results of their own mistakes, and also learn and > refutations of non-working tactics all the time if they do things like > tsumego. But the distribution on bad moves that people devote their > attention to learning to refute is extremely nonuniform - e.g. nobody > studies what to do if the opponent plays the 1-2 point against your 4-4 > point - and I bet that's important too. > > I'm also highly curious if there's anyone who has experimented with or > found any ways to mitigate this issue! > > ---------- > > If you want an example of this actually mattering, here's example where > Leela makes a big mistake in a game that I think is due to this kind of > issue. Leela is white. > > The last several moves to reach the screenshotted position were: > 16. wG6 (capture stone in ladder) > 17. bQ13 (shoulder hit + ladder breaker) > 18. wR13 (push + restore ladder) > 19. bQ12 (extend + break ladder again) > 20. wR12 (Leela's mistake, does not restore ladder) > > You can see why Leela makes the mistake at move 20. On move 21 it pretty > much doesn't consider escaping at H5. The policy net puts it at 1.03%, > which I guess is just a little too low for it to get some simulations here > even with a lot of thinking time. So Leela thinks black is behind, > considering moves with 44% and 48% overall win rates. However, if you play > H5 on the board, Leela almost instantly changes its mind to 54% for black. > So not seeing H5 escape on move 21 leads it to blunder on move 20. If you > go back on move 20 and force Leela to capture the ladder, it's happy with > the position for white, so it's also not as if Leela dislikes the ladder > capture, it's just that it wrongly thinks R12 is slightly better due to not > seeing the escape. > > To some degree this maybe means Leela is insufficiently explorative in > cases like this, but still, why does the policy net not put H5 more than > 1.03%. After all, it's vastly more likely than 1% that that a good player > will see the ladder works here and try escaping in this position. I would > speculate that this is because of biases in training. If Leela is trained > on high amateur and pro games, then one would expect that in nearly all > training games, conditional on white playing a move like R12 and ignoring > the ladder escape, the ladder is often not too important and therefore > Black should usually not escape. By contrast, when the ladder escape is > important, then in such games White will capture at H5 instead of playing > R12 and not allow the escape in the first place. So there is rarely a > training example that allows the policy net to learn that the ladder escape > is likely for black to play here. > > > On Mon, Apr 17, 2017 at 7:04 AM, Jim O'Flaherty < > jim.oflaherty...@gmail.com> wrote: > >> It seems chasing down good moves for bad shapes would be an explosion of >> "exception cases", like combinatorially huge. So, while you would be saving >> some branching in the search space, you would be ballooning up the number >> of patterns for which to scan by orders of magnitude. >> >> Wouldn't it be preferable to just have the AI continue to make the better >> move emergently and generally from probabilities around win placements as >> opposed to what appears to be a focus on one type of local optima? >> >> >> On Mon, Apr 17, 2017 at 5:07 AM, lemonsqueeze <lemonsque...@free.fr> >> wrote: >> >>> Hi, >>> >>> I'm sure the topic must have come up before but i can't seem to find it >>> right now, i'd appreciate if someone can point me in the right direction. >>> >>> I'm looking into MM, LFR and similar cpu-based pattern approaches for >>> generating priors, and was wondering about basic bad shape: >>> >>> Let's say we use 6d games for training. The system becomes pretty good >>> at predicting 6d moves by learning patterns associated with the kind of >>> moves 6d players make. >>> >>> However it seems it doesn't learn to punish basic mistakes effectively >>> (say double ataris, shapes with obvious defects ...) because they almost >>> never show up in 6d games =) They show up in the search tree though and >>> without good answer search might take a long time to realize these moves >>> don't work. >>> >>> Maybe I missed some later paper / development but basically, >>> Wouldn't it make sense to also train on good answers to bad moves ? >>> (maybe harvesting them from the search tree or something like that) >>> >>> I'm thinking about basic situations like this which patterns should be >>> able to recognize: >>> >>> A B C D E F G H J K L M N O P Q R S T >>> +---------------------------------------+ >>> 19 | . . . . . . . . . . . . . . . . . . . | >>> 18 | . . . O O . O . . . . . . . . . . . . | >>> 17 | . . X . X O . X O . . . . . . . . . . | >>> 16 | . . X . X O . O O . . . . . . X . . . | >>> 15 | . . . . . X X X X . . . . . . . . . . | >>> 14 | . . X . . . . . . . . . . . . . . . . | >>> 13 | . O . . . . . . . . . . . . X . O . . | >>> 12 | . . O . . . . . . . . . . . . . O . . | >>> 11 | . . O X . . . . . . . . . . . X . . . | >>> 10 | . . O X . . . . . . . . . . . X O . . | >>> 9 | . . O X . . . X . . . X O . . X O . . | >>> 8 | . . O X . . . O X X X O X)X X O O . . | >>> 7 | . O X . . . . . O O X O . X O O . . . | >>> 6 | O X X . X X X O . . O . . . X O X . . | >>> 5 | . O O . . O X . . . . . O . . . . . . | >>> 4 | . X O O O O X . . . O . . O . O . . . | >>> 3 | . X X X X O . . X . X X . . . . . . . | >>> 2 | . . . . X O . . . . . . O . . . . . . | >>> 1 | . . . . . . . . . . . . . . . . . . . | >>> +---------------------------------------+ >>> >>> Patterns probably never see this during training and miss W L9, >>> For example : >>> >>> In Remi's CrazyPatterns.exe L9 comes in 4th position: >>> [ M10 N10 O6 L9 ... >>> >>> With Pachi's large patterns it's 8th: >>> [ G8 M10 G9 O17 N10 O6 J4 L9 ... >>> >>> Cheers, >>> Matt >>> >>> ---- >>> >>> MM: Computing Elo Ratings of Move Patterns in the Game of Go >>> https://www.remi-coulom.fr/Amsterdam2007/ >>> >>> LFR: Move Prediction in Go – Modelling Feature Interactions Using >>> Latent Factors >>> https://www.ismll.uni-hildesheim.de/pub/pdfs/wistuba_et_al_ >>> KI_2013.pdf >>> >>> >>> _______________________________________________ >>> Computer-go mailing list >>> Computer-go@computer-go.org >>> http://computer-go.org/mailman/listinfo/computer-go >>> >> >> >> _______________________________________________ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> > >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go