Hi Cliff and others,

As I came up with this kind of a test perhaps I should
say a few things about its motivation...

The problem was that the Webmind system had a number of
proposed reasoning systems and it wasn't clear which was
the best. Essentially the reasoning systems took as input
a whole lot of data like:

Fluffy is a Cat
Snuggles is a Cat
Tweety is a Bird
Cats are animals
Cats are mamals
Cats are dogs

and so on... This data might have errors, it might be
very bias in its sample of the outside world, it might
contain contradictions and so on... nevertheless we
would expect some basic level of consistency to it.

The reasoning systems take this and come up with all
sorts of conclusions like: Fluffy is an animal based
on the fact that Fluffy is a Cat and Cats seem to be
animals... In a sense the reasoning system is trying
to "fill in the gaps" in our data by looking at the
data it has and drawing simple conclusions.

So what I wanted to do is to some how artificially
generate test sets that I could use to automatically
test the systems against each other. I would vary the
number of entities in the space (Fluffy, Cat, Bird...)
the amount of noise in the data set, the number of
data points and so on...

Now the problem is that you can't just randomly generate
any old data points, you actually need at least some kind
of consistency which is a bit tricky when you have some
A's being B's and most B's being C's and all B's not being
D's but all D's being A's. Before long your data is totally
self contradictorary are are basically just feeding your
reasoning system complete junk and so it isn't a very
interesting test of the system's ability.

So my idea was basically to create a virtual Venn diagram
using randomly placed rectangles as the sets used to compute
the probability for each entity in the space and the conditional
probabilities of their various intersections. This way your
fundamental underlying system has consistent probabilities
which is a good start. You can then randomly sample points
from the space or directly compute the probabilities from the
rectange areas (actually n dimensional rectanges as this gives
more interesting intersection possibilities) and so on to get
your data sets. You can then look at how well the system is
able to approximate the true probabilities based on the
incomplete data that it has been given (you can compute the
true probabilities directly as you know the recatangle areas).

I think I proposed about 6 or so basic variations on this
theme to test the reasoning system's ability with deal with
various level or noise and missing data... you can come up
with all sorts of interesting variations with a bit of thought.

Yeah, just a fancy Venn diagram really used to generate
reasonably consistent data sets.

Cheers
Shane

-------
To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

Reply via email to