A model based approach is necessary, but not sufficient.  Equally important 
will be using parallax to divide the visual field into objects which move 
together.  This can be done with one camera, by oscillating it's position, 
though two cameras add significantly.  Three allow for better positioning in 
a space which includes vertical as well as horizontal and depth.  Still, most 
of the benefit could be derived from one camera, with a bit of oscillation.

N.B.:  Edges are never sharp.  Real objects don't have sharp visual edges, and 
have frequently evolved to exhibit this property, but even simple blocks 
aren't simple (unless you happen to see them straight on).

The basic technique, as I see it, is to take a series of images of the same 
area with the camera in slightly different positions.  Not much different, 
but slightly.  People do this by tilting their torso or by taking a step (and 
bouncing up and down during the process).  Edge detectors and feature 
extractors operate on these images to create ensembles of characteristics 
that move together against a background (which moves differently).  The 
general presumption is that faster motion happens among closer "objects", and 
that things which move together are the same object.  (Arms, legs, etc., will 
cause this to need refinement, but that's the first approximation.)  Once you 
have the visual field divided into objects, then you attempt to recognize 
them...possibly subdividing them into object collections.

Your current model of the world helps you to predict what might be found among 
the current objects...and what you should notice the absence of.  At this 
point, you assign significance to some parts, and ignore the rest 
(limitations on processing power).  The tire that's swinging back and forth 
as the tree's branches blow in the wind is ignored, but the dog that you're 
calling to dinner is noticed, e.g.  You should also notice that an unexpected 
fire has blown up in the barbecue pit.  The dog was expected, and predicted 
by your model (quick and easy recognition).  The fire was unexpected.  You 
need to dig through your model to recognize whether it's significant or not.  
The piece of paper blowing overhead is eventually recognized as a kite...it 
takes longer to recognize it, because it wasn't expected, but it's discarded 
as irrelevant in the face of the obstinate dog and the unexpected fire that 
needs to be extinguished.  You probably won't later remember even noticing 
it.

(Yes, I know that we don't have anything up to this yet.  But this is where we 
are headed.)

On Monday 13 February 2006 04:35 am, Yan King Yin wrote:
> Hi Ben et al
>
> I have been thinking about the vision problem, it seems that the
> model-based approach is most promising.  After studying a lot of real
> digital pics, I have confidence that, with this approach, a vision system
> can be developed that can recognize almost everything humans can (with
> proper training of course).
>
> I think the appropriate interface between the vision module and the AGI
> would be a "*physical world model*"  (PWM).  Such a model is composed of
> objects which are in turn composed of geons*.  The PWM contains knowledge
> about physical objects, for example the fact that a TV is usually a
> "block", with a slightly concave screen, and some buttons which are small
> cubes/cylinders.  Well, an old-fashioned TV.  Or, a bottle is usually a
> cylinder with an empty "inside", and is topologically isomorphic with many
> other types of containers.
>
> I don't know if Novamente currently has such a PWM (perhaps by another
> name).  Anyway, my vision module has to interact with the PWM.  The main
> function of the vision module is to map the geon-based model to
> *appearances *.  I've had this part roughly figured out.
>
> Novamente's part is to help construct such a world model.  We need to find
> out how to represent objects, geons (which I can handle), and the
> interrelations between objects.  Let's avoid doing overlapping work...
>
> Let's discuss this here to flesh out the details.  That would speed things
> up a lot.  Then I'll put everything on a web site.  If I have time I'll do
> a presentation, but I think a paper is most important.
>
> *geons:  by this I mean a set of geometric primitives more general than
> the shapes commonly defined as geons in the literature.  My geons are like
> partial "motifs" that can compose objects -- slightly lower-level than
> common geons.
>
> yky
>
> -------
> To unsubscribe, change your address, or temporarily deactivate your
> subscription, please go to
> http://v2.listbox.com/member/[EMAIL PROTECTED]

-------
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Reply via email to