Re: [agi] It's Visual Scene Recognition, Stupid

Matt Mahoney Fri, 05 Apr 2013 09:52:27 -0700

Yes, you're right. That's why I estimated the number of visual
features to be larger than our vocabulary. Objects in the picture
(policeman, yellow tape) are not the top level. They are below the
level of "crime scene".


On Fri, Apr 5, 2013 at 7:01 AM, Mike Tintner <[email protected]> wrote:
> Here's what I was reaching for in my last post to Matt - it represents a
> profound change in vision philosophy  (and helps explain why current
> AI/scientific approaches to vision (& everything else) are so mindblowingly
> fragmented and simplistic).
>
> Vision is always "visual scene recognition" and never just "visual object
> recognition".
>
> To realise this, look at how you look at four pictures:
>
> http://media.oregonlive.com/portland_impact/photo/barber-shootingjpg-fca82e981d021b3b.jpg
>
> http://www.forgotmylines.com/wp-content/uploads/2010/07/ReasonstobePrettyFIghtScene.jpg
>
> http://2.bp.blogspot.com/_ASZ1J20yTgQ/TLuZJT-TgXI/AAAAAAAAAA0/ps7cifXlBWo/s1600/DSC01098.JPG
>
> http://simonhalliday.files.wordpress.com/2009/03/library8003.jpg
>
> What you see  - what "visual scene recognition" means first of all - is
> [seeing]    **object[s]-in-a- field**
>
> You see - a policeman putting up a tape on a street,  kids fighting on a
> playground[or similar],  a guy playing tennis on a tennis court, books and
> candles on a table in some kind of library.
>
> You don't just see isolated objects - you automatically see
> objects-doing-things-in-a-field. You aren't just aware vaguely of the
> "gestalt" or the "context", which is the vague way some vision philosophy
> has thought about these things.  You immediately look to see what the
> principal objects are doing in that whole field.  Your life in the real
> world depends on quickly working that out - otherwise that
> car/animal/falling masonry may hit you. Every scene is potentially
> dangerous.
>
> But current AI and scientific thinking never - characteristically - "sees
> the big picture".  AI and science metacognitively only see isolated objects
> -  they see the world like this:
>
> http://www.freevector.com/site_media/preview_images/FreeVector-People-Vector-Art.jpg
>
> They don't see or look for the scene -  the field surrounding those objects,
> and what those objects are doing in (& how they're relating to) that field.
>
> And this applies to everything AI touches.  Its vision is always fragmented.
> It sees only fragmented words and sentences:
>
> http://img2.etsystatic.com/000/0/5464342/il_fullxfull.130745770.jpg
>
> It never sees the text as a whole. And human readers always read
> sentences-in-a-text, (and talk sentences-in-a-conversation), never just
> sentences.
>
> But even "visual scene recognition" is too simplistic!
>
> Actually, it's always "visual movie theatre recognition".
>
> Human vision is never vision of just a scene out there. It's simultaneously
> :
>
> "vision of an observer (oneself) viewing the scene"  [/objects in a field].
> The acting observer seeing as well as the scene seen.
>
> We always see ourselves watching the scene - are always aware of the point
> of view of the scene, and its distance from us (as I have detailed before on
> this forum). In real life, of course, we are normally acting in that field
> and moving through it - and the position of objects in relation to us is
> vital information..
>
> And then "visual movie theatre recognition" entails not just visual-scene,
> but visual-scene-in-a-MOVIE recognition!! (in a whole story!)
>
> We don't just see a scene in an isolated moment in time, but are aware of it
> as a scene-in-a-stream-of-scenes - a movie
>
> We don't just see objects in a field in a timeless moment - - we place them
> as moving in time as well as space . When we look at those kids fighting, we
> don't just see them occupying isolated postures - we see them as moving in
> time  - having started a fight beforehand, and still to finish it. When we
> look at the policeman we are aware that he has come up to that tape in the
> past, and will move away from it in the future.
>
> And similarly, we don't just see objects-in-a-field,  but
> objects-in-a-field-in-a-WORLD.  As objects not just in a single "theatre of
> operations" but a whole world-of-operations.  We are aware of the fields
> that lie beyond the immediate field - what lies beyond the movie theatre -
> this may be crucial to understanding what is happening We couldn't
> understand this picture if our vision were confined to the immediate
> wing/field shown, and not the fields beyond the picture:
>
> http://www.google.co.uk/imgres?um=1&hl=en&biw=1645&bih=767&tbm=isch&tbnid=lBHESwvnTIYWOM:&imgrefurl=http://www.advrider.com/forums/showthread.php%3Ft%3D499369%26page%3D26&docid=Uoxo7hdI1Bj4MM&imgurl=http://rookery2.viary.com/storagev12/1189500/1189682_2b3b_625x625.jpg&w=399&h=300&ei=m6deUbWqHeOm0QWjtYDIBw&zoom=1&ved=1t:3588,r:22,s:0,i:153&iact=rc&dur=2325&page=1&tbnh=184&tbnw=251&start=0&ndsp=28&tx=130&ty=89
>
> And actually - well you knew this was coming, right? - we don't just see
> "objects" - lifeless things.  We see BODIES. We always see bodies. We're
> always aware of how those bodies move - whether they're alive or dead - and
> how they're moving.
>
> And "visual body recognition" is always EMBODIED.  You can't understand how
> other bodies - especially other human bodies  - move if you haven't got a
> body yourself - and can't use that body to simulate how other bodies do and
> will move.
>
> So let's see [!] what we have here.
>
> "Visual recognition" - human vision - is a helluva lot more than "visual
> object recognition".  "Visual scene recognition" is a bare minimum  "Visual
> movie theatre recognition" is more like it.
>
> And that means embodied-vision-of-a-viewer-viewing-objects/bodies-moving-
> at-points-in-a-stream-of-=movement- in-a-field-within-a-world-of-fields.
>
> And that's the easy-peasy part.  Then we get on to understanding language -
> and general conceptualisation - which deals with whole classes of object -in
> whole classes of fields etc. in whole classes of world
>
> Vision then entails not just the small , scientific, fragmented, narrow,
> narrow-minded, totally blinkered picture of objects, but the big, artistic,
> integrated, broad-minded, panoramic picture of what lies beyond that object
> in space and time - in a field in a world of fields, in a scene in a stream
> of scenes.
>
> How many quadrillions you got, Matt?
>
>
>
>
>
>
>
>
>
>
>
> -------------------------------------------
> AGI
> Archives: https://www.listbox.com/member/archive/303/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/303/3701026-786a0853
> Modify Your Subscription:
> https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com



-- 
-- Matt Mahoney, [email protected]


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Re: [agi] It's Visual Scene Recognition, Stupid

Reply via email to