Re: [agi] Computer Vision not as hard as I thought!

David Jones Wed, 04 Aug 2010 06:27:43 -0700

:D Thanks Jim for paying attention!

One very cool thing about the human brain is that we use multiple feedback
mechanisms to correct for such problems as observer movement. For example,
the inner ear senses your bodies movement and provides feedback for visual
processing. This is why we get nauseous when the ear disagrees with the eyes
and other senses. As you said, eye muscles also provide feedback about how
the eye itself has moved. In example papers I have read, such as "Object
Discovery through Motion, Appearance and Shape", the researchers know the
position of the camera (I'm not sure how) and use that to determine which
moving features are closest to the cameras movement, and therefore are not
actually moving. Once you know how much the camera moved, you can try to
subtract this from apparent motion.

You're right that I should attempt to implement the system. I think I will
in fact, but it is difficult because I have limited time and resources. My
main goal is to make sure it is accomplished, even if not by me. So,
sometimes I think that it is better to prove that it can be done than to
actually spend a much longer amount of time to actually do it myself. I am
struggling to figure out how I can gather the resources or support to
accomplish the monstrous task. I think that I should work on the theoretical
basis in addition to the actual implementation. This is likely important to
make sure that my design is well grounded and reflects reality. It is very
hard for me to balance everything that has to be done though. This
definitely should be done by a much larger team of people.

As for your belief that computer vision is not necessary for AGI, I just
finished writing an email to someone else who had similar questions
regarding why computer vision helps with AGI. I will append them here. I
hope you find them helpful.

*********Appended Below: Why do I think computer vision is so important for
AGI.**********

Someone asked, if I solved computer vision, why would it help in higher
reasoning and learning? Why do I think computer vision so important for AGI.

*Regarding higher reasoning and learning, I'll try to explain my views a bit
here:*

*When I talk about higher reasoning and learning,  I'm referring to all of
the following: *
* language learning,
* language disambiguation and interpretation
* learning about cause and effect
* learning about object/environment behavior and mechanisms regarding how or
why they behave certain ways
* explanatory reasoning that requires common sense knowledge
* learning common sense knowledge at increasing levels of abstraction.
* trial and error learning
* learning to predict the environment. This is extremely important for the
purposes of goal pursuit, which is the whole point of AI I think.
* inductive learning on examples from observation. This is needed for
language learning. This also helps with predicting the behavior of new
object instances.
* rule induction from observed examples
* etc. etc. etc.

*So, what am I really using computer vision for?*
I'm using computer vision to gather *knowledge*, including common sense
knowledge. It is very clear to me, and probably many others, that knowledge
is required to solve the problems we want AI to solve. The core problem of
AGI is knowledge. There are many other supporting problems such as machine
learning, planning, language disambiguation, etc., but without knowledge it
is much harder than it needs to be. We need it, we want it, but we haven't
been able to get enough of it. Knowledge also makes it easier to solve these
problems, making it possible to use simpler algorithms and to learn from
fewer examples.

Computer vision isn't just for knowledge though. It's also for goal pursuit.
Many things we want an AI to do require exploration of the environment,
trial and error learning, exploration in general, interaction with the
environment, unsupervised  learning, etc. These require the ability to
perceive and understand the environment. The environment is too complex to
predict blindly. It is absolutely essential to be able to directly observe
it and receive feedback.

*So, why computer vision? Why can't we just enter knowledge manually?*

Explaining this requires several supporting arguments that will have to be
argued separately. So I will list them below:

a) The knowledge we require for AI to do what we want is vast and complex
and we can prove that it is completely ineffective to enter the knowledge we
need manually.
b) Computer vision is the most effective means of gathering facts about the
world. Knowledge and experience can be gained from analysis of these facts.
c) Language is not learned through passive observation. The associations
that words have to the environment and our common sense knowledge of the
environment/world are absolutely essential to language learning,
understanding and disambiguation. When visual information is available,
children use visual cues from their parents and from the objects they are
interacting with to figure out word-environment associations. If visual info
is not available, touch is essential to replace the visual cues. Touch can
provide much of the same info as vision, but it is not as effective because
not everything is in reach and it provides less information than vision.
There is some very good documentation out there on how children learn
language that supports this. One example is "How Children Learn Language" by
William O'grady.
d) The real world cannot be predicted blindly. It is absolutely essential to
be able to directly observe it and receive feedback.
e) Manual entry of knowledge, even if possible, would be extremely slow and
would be a very serious bottleneck(it already is). This is a major reason we
want AI... to increase our man power and remove man-power related
bottlenecks.

I could argue the above pieces separately. But, since the email is already
long, I'll leave at that for now. If you want to explore any of them
further, I can delve more into them.

On Wed, Aug 4, 2010 at 9:10 AM, Jim Bromer <jimbro...@gmail.com> wrote:

> On Tue, Aug 3, 2010 at 11:52 AM, David Jones <davidher...@gmail.com>wrote:
> I've suddenly realized that computer vision of real images is very much
> solvable and that it is now just a matter of engineering...
> I've also realized that I don't actually have to implement it, which is
> what is most difficult because even if you know a solution to part of the
> problem has certain properties and issues, implementing it takes a lot of
> time. Whereas I can just assume I have a less than perfect solution with the
> properties I predict from other experiments. Then I can solve the problem
> without actually implementing every last detail...
> *First*, existing methods find observations that are likely true by
> themselves. They find data patterns that are very unlikely to occur by
> coincidence, such as many features moving together over several frames of a
> video and over a statistically significant distance. They use thresholds to
> ensure that the observed changes are likely transformations of the original
> property observed or to ensure the statistical significance of an
> observation. These are highly likely true observations and not coincidences
> or noise.
>  --------------------------------------------------
> Just looking at these statements, I can find three significant errors. (I
> do agree with some of what you said, like the significance of finding
> observations that are likely true in themselves.)  When the camera moves (in
> a rotation or pan) many features will appear 'to move together over a
> statistically significant distance'.  One might suppose that the animal can
> sense the movement of his own eyes but then again, he can rotate his head
> and use his vision to gauge the rotation of his head.  So right away there
> is some kind of serious error in your statement.  It might be resolvable, it
> is just that your statement does not really do the resolution.  I do believe
> that computer vision is possible with contemporary computers but you are
> also saying that while you can't get your algorithms to work the way you had
> hoped, it doesn't really matter because you can figure it all out without
> the work of implementation.  My point of view is that these represent major
> errors in reasoning.
> I hope to get back to actual visual processing experiments again.  Although
> I don't think that computer vision is necessary for AGI, I do think that
> there is so much to be learned from experimenting with computer vision that
> it is a serious mistake not to take advantage of opportunity.
> Jim Bromer
>
>
> On Tue, Aug 3, 2010 at 11:52 AM, David Jones <davidher...@gmail.com>wrote:
>
>> I've suddenly realized that computer vision of real images is very much
>> solvable and that it is now just a matter of engineering. I was so stuck
>> before because you can't make the simple assumptions in screenshot computer
>> vision that you can in real computer vision. This makes experience probably
>> necessary to effectively learn from screenshots. Objects in real images to
>> not change drastically in appearance, position or other dimensions in
>> unpredictable ways.
>>
>> The reason I came to the conclusion that it's a lot easier than I thought
>> is that I found a way to describe why existing solutions work, how they work
>> and how to come up with even better solutions.
>>
>> I've also realized that I don't actually have to implement it, which is
>> what is most difficult because even if you know a solution to part of the
>> problem has certain properties and issues, implementing it takes a lot of
>> time. Whereas I can just assume I have a less than perfect solution with the
>> properties I predict from other experiments. Then I can solve the problem
>> without actually implementing every last detail.
>>
>> *First*, existing methods find observations that are likely true by
>> themselves. They find data patterns that are very unlikely to occur by
>> coincidence, such as many features moving together over several frames of a
>> video and over a statistically significant distance. They use thresholds to
>> ensure that the observed changes are likely transformations of the original
>> property observed or to ensure the statistical significance of an
>> observation. These are highly likely true observations and not coincidences
>> or noise.
>>
>> *Second*, they make sure that the other possible explanations of the
>> observations are very unlikely. This is usually done using a threshold, and
>> a second difference threshold from the first match to the second match. This
>> makes sure that second best matches are much farther away than the best
>> match. This is important because it's not enough to find a very likely match
>> if there are 1000 very likely matches. You have to be able to show that the
>> other matches are very unlikely, otherwise the specific match you pick may
>> be just a tiny bit better than the others, and the confidence of that match
>> would be very low.
>>
>>
>> So, my initial design plans are as follows. Note: I will probably not
>> actually implement the system because the engineering part dominates the
>> time. I'd rather convert real videos to pseudo test cases or simulation test
>> cases and then write a psuedo design and algorithm that can solve it. This
>> would show that it works without actually spending the time needed to
>> implement it. It's more important for me to prove it works and show what it
>> can do than to actually do it. If I can prove it, there will be sufficient
>> motivation for others to do it with more resources and man power than I have
>> at my disposal.
>>
>> *My Design*
>> *First, we use high speed cameras and lidar systems to gather sufficient
>> data with very low uncertainty because the changes possible between data
>> points can be assumed to be very low, allowing our thresholds to be much
>> smaller, which eliminates many possible errors and ambiguities.
>>
>> *Second*, *we have to gain experience from high confidence observations.
>> These are gathered as follows:
>> 1) Describe allowable transformations(thresholds) and what they mean. Such
>> as the change in size and position of an object based on the frame rate of a
>> camera. Another might be allowable change in hue and contrast because of
>> lighting changes.  With a high frame rate camera, if you can find a match
>> that is within these high confidence thresholds in multiple dimensions(size,
>> position, color, etc), then you have a high confidence match.
>> 2) Find data patterns that are very unlikely to occur by coincidence, such
>> as many features moving together over several frames of a video and over a
>> statistically significant distance. These are highly likely true
>> observations and not coincidences or noise.
>> 3) Most importantly, make sure the matches we find are highly likely on
>> their own and unlikely to be coincidental.
>> 4) Second most importantly, make sure that any other possible matches or
>> alternative explanations are very unlikely in terms of distance( measured in
>> multiple dimensions and weighted by the certainty of those observations).
>> These should also be in terms of the thresholds we used previously because
>> those define acceptable changes in a normalized way.
>>
>> *That is a rough description of the idea. Basically highly likely matches
>> and very unlikely for the matches to be incorrect, coincidental or
>> mistmatched. *
>>
>> Third, We use experience, when we have it, in combination with the
>> algorithm I just described. If we can find unlikely coincidences between our
>> experience and our raw sensory observations, we can use this to look
>> specifically for those important observations the experience predicts and
>> verify them, which will in turn give us higher confidence of inferences.
>>
>> Once we have solved the correspondence problem like this, we can perform
>> higher reasoning and learning.
>>
>> Dave
>>   *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
>> <https://www.listbox.com/member/archive/rss/303/> | 
>> Modify<https://www.listbox.com/member/?&;>Your Subscription
>> <http://www.listbox.com/>
>>
>
>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/> | 
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>

-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Re: [agi] Computer Vision not as hard as I thought!

Reply via email to