Too easy ;) One of the points in patch-space corresponds to X=center, Y=center, Scale=huge, so this patch is a rescaled version (say 20x20) of the whole image (say 1000x1000). In this 20x20 patch, the letter 'A' emerges naturally and can be reconstructed by the NN, and therefore be recognized. It will probably be salient, since it's far away in patch-space from the small A's in the Scale dimension. Far-away points in patch-space dont battle for salience. Your second example is solved analogously.
Okay, time for diner now. Vision solved :) Regards, Durk On Mon, Mar 3, 2008 at 7:59 PM, Richard Loosemore <[EMAIL PROTECTED]> wrote: > Kingma, D.P. wrote: > > On Mon, Mar 3, 2008 at 6:39 PM, Richard Loosemore <[EMAIL PROTECTED] > > <mailto:[EMAIL PROTECTED]>> wrote: > > > > The problems with bolting together NN and GA are so numerous it is > hard > > to know where to begin. For one thing, you cannot represent > structured > > information with NNs unless you go to some trouble to add extra > > architecture. Most NNs can only cope with single concepts learned > in > > isolation, so if you show a visual field containing 5,000 copies of > the > > letter 'A', all that happens is that the 'A' neuron fires. > > > > If you do find some way to get around this problem, your solution > will > > end up being the tail that wags the dog: the NN itself will fade > into > > relative insignificance compared to your solution. > > > > > > Well, you could achieve that (5000 registration of the letter 'A' with > > their corresponding position in the image) by using a sliding window > > over multiple rescaled (and maybe other transformations) transformations > > of the input image. This way, you get image patches for each window and > > scale (and maybe other transformations), and each patch can be a given a > > corresponding position in multidimensional space (e.g., an image patch > > with X and Y position and scale S has is a point in 3-dimensional > > space). For each of the produced points (patches) in the space, run the > > neural net to produce a lower-dimensional code and corresponding energy > > (= reconstruction quality). Now filter this space by let the points have > > local battles for salience using some heuristic (e.g. lower energy means > > higher salience) and filter out the low-salient points. This produces a > > filtered space with fewer points then the previous one, and each point > > containing a lower-dimensional code. > > > > In the example of the letter 'A', the above method would recognize all > > 5000 versions while remembering their individual input position. This > > presumes the neural net is properly trained on the letter 'A' and can > > properly reconstuct them (using Hinton's method). This should produce > > 5000 registrations of the letter 'A', while filtering out unimportant > > information. > > > > But you could take it a step further. For each image input, the above > > method creates a filtered, 3-dimensional space with points containing > > low-dimensional codes. This space can then again be harvested by taking > > patches with each patch containing /n/ points, each point containing an > > /m /dimensional code, so each patch being (/m/*/n/)./ /A neural net can > > be trained on lowering the dimension of these patches from (/m/*/n/) to > > something lower-dimensional. This process is quite similar to the one in > > the previous paragraph. > > > > What could /possibly /go wrong? :) > > > > Regards, > > Durk Kingma > > Excellent! Sounds like a perfect solution ;-). > > Oh, wait! > > What about......... if the scene is structured in such a way that the > 5,000 copies of the letter 'A' were actually scattered around in such a > way that most (but not all) of them were arranged to form a huge letter > 'A'? > > Would it then count 5,001 copies? > > Oh, and one more thing I forgot to mention that is in the same scene > (how could I forget this one?): there are also a couple of women > standing side by side, leaning against each other with their shoulders > touching and keeping their bodies stiff and straight, forming the two > sides of a letter 'A', and holding a model of a horizontally reclining > woman between them at waist height, to form the crossbar of a letter 'A'. > > Could we get the NN to recognize, in the context of the overall scene, > that here were actually 5,002 copies of the letter 'A'......? > > And if the scene had one single, rather small letter B over in the > corner, would the NN find this funny? > > You have 30 minutes to devise an algorithm, Durk... :-). > > > > Richard Loosemore > > > ------------------------------------------- > agi > Archives: http://www.listbox.com/member/archive/303/=now > RSS Feed: http://www.listbox.com/member/archive/rss/303/ > Modify Your Subscription: > http://www.listbox.com/member/?& > Powered by Listbox: http://www.listbox.com > ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b Powered by Listbox: http://www.listbox.com
