I've suddenly realized that computer vision of real images is very much
solvable and that it is now just a matter of engineering. I was so stuck
before because you can't make the simple assumptions in screenshot computer
vision that you can in real computer vision. This makes experience probably
necessary to effectively learn from screenshots. Objects in real images to
not change drastically in appearance, position or other dimensions in
unpredictable ways.

The reason I came to the conclusion that it's a lot easier than I thought is
that I found a way to describe why existing solutions work, how they work
and how to come up with even better solutions.

I've also realized that I don't actually have to implement it, which is what
is most difficult because even if you know a solution to part of the problem
has certain properties and issues, implementing it takes a lot of time.
Whereas I can just assume I have a less than perfect solution with the
properties I predict from other experiments. Then I can solve the problem
without actually implementing every last detail.

*First*, existing methods find observations that are likely true by
themselves. They find data patterns that are very unlikely to occur by
coincidence, such as many features moving together over several frames of a
video and over a statistically significant distance. They use thresholds to
ensure that the observed changes are likely transformations of the original
property observed or to ensure the statistical significance of an
observation. These are highly likely true observations and not coincidences
or noise.

*Second*, they make sure that the other possible explanations of the
observations are very unlikely. This is usually done using a threshold, and
a second difference threshold from the first match to the second match. This
makes sure that second best matches are much farther away than the best
match. This is important because it's not enough to find a very likely match
if there are 1000 very likely matches. You have to be able to show that the
other matches are very unlikely, otherwise the specific match you pick may
be just a tiny bit better than the others, and the confidence of that match
would be very low.


So, my initial design plans are as follows. Note: I will probably not
actually implement the system because the engineering part dominates the
time. I'd rather convert real videos to pseudo test cases or simulation test
cases and then write a psuedo design and algorithm that can solve it. This
would show that it works without actually spending the time needed to
implement it. It's more important for me to prove it works and show what it
can do than to actually do it. If I can prove it, there will be sufficient
motivation for others to do it with more resources and man power than I have
at my disposal.

*My Design*
*First, we use high speed cameras and lidar systems to gather sufficient
data with very low uncertainty because the changes possible between data
points can be assumed to be very low, allowing our thresholds to be much
smaller, which eliminates many possible errors and ambiguities.

*Second*, *we have to gain experience from high confidence observations.
These are gathered as follows:
1) Describe allowable transformations(thresholds) and what they mean. Such
as the change in size and position of an object based on the frame rate of a
camera. Another might be allowable change in hue and contrast because of
lighting changes.  With a high frame rate camera, if you can find a match
that is within these high confidence thresholds in multiple dimensions(size,
position, color, etc), then you have a high confidence match.
2) Find data patterns that are very unlikely to occur by coincidence, such
as many features moving together over several frames of a video and over a
statistically significant distance. These are highly likely true
observations and not coincidences or noise.
3) Most importantly, make sure the matches we find are highly likely on
their own and unlikely to be coincidental.
4) Second most importantly, make sure that any other possible matches or
alternative explanations are very unlikely in terms of distance( measured in
multiple dimensions and weighted by the certainty of those observations).
These should also be in terms of the thresholds we used previously because
those define acceptable changes in a normalized way.

*That is a rough description of the idea. Basically highly likely matches
and very unlikely for the matches to be incorrect, coincidental or
mistmatched. *

Third, We use experience, when we have it, in combination with the algorithm
I just described. If we can find unlikely coincidences between our
experience and our raw sensory observations, we can use this to look
specifically for those important observations the experience predicts and
verify them, which will in turn give us higher confidence of inferences.

Once we have solved the correspondence problem like this, we can perform
higher reasoning and learning.

Dave



-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Reply via email to