On 2/17/08, Nicholas Thompson <[EMAIL PROTECTED]> wrote:
> Robert,
>
> Thanks for these comments.  Are you actually a person who could make me
> understand Bayes intuitively, a little bit?
> You could have free coffee from me anytime you wanted to try that.

Here's a basic rundown of Bayes. I am not an expert; this is more or
less as much as I know. I can't collect the free coffee as I'm in Los
Angeles right now, but maybe it'll help.

Bayes theorem goes like this:

p(a ^ b) = (p(b ^ a) * p(a)) / p(b)

where ^ means "given."

So the probability of A, given B, is equal to (the probability of A,
given B, times the base probability of A itself) divided by the base
probability of B itself.

> The question was, given a panzy, what is the probability of [panzy-blooming
> April 1 in Santa Fe].  So the data could be faulted in two different ways.
> Tree-hugger Jones could know know what a panzy is, and report the blooming
> of a "forget-me-not" on April first;   or TJ could he could have the date
> wrong.   Or he could report his geographic coordinates wrong.  The hardest
> of these is the plant identification part, I would think.

So I don't know if you could actually model this in a Bayesian way.
You are basically modelling cause and effect with Bayes. This problem
with the flower blooming at a particular time in a particular place is
just a combination of probabilities. A nice canoncial Bayes example
is, given that the grass is wet, what is the probability that it
rained last night?

a = rained last night
b = grass wet

p(a ^ b) = probability that it rained last night, given that the grass is wet
p(b ^ a) = probability that the grass is wet, given that it rained
last night (100%)
p(a) = base probability of it raining last night
p(b) = base probability of grass being wet

p(b) will reflect both times when the grass was wet because it rained
and times when the grass was wet because the automatic sprinklers
turned on, or the kids were throwing water balloons at each other.
p(a) can be high or low depending on the time of year. But AFAIK you
do need to initially collect some data on the general probability of
the grass being wet, given that it rained last night, to solve the
equation at all. That's why this is a canonical example; it's easy to
see that p(b ^ a) will be about 100%, because lawns generally don't
dry out until the sun comes up.

So to predict the probability of a particular flower blooming in a
particular place at a particular time, that's calculating the
probability of a coincidence, whereas Bayes is really all about cause
and effect and pattern recognition, or inference - when X happens, Y
often happens too, so since I know Y obviously happened here, can I
say that X must have happened also? It's basically an equation that
can do simple kinds of detective work.

For example, I'm working on something I can't necessarily describe in
too much detail, but it's a Web application which creates probability
matrices, such that it will know that if User X is in Category Y, they
probably want to look at item Z. That's cool because we can say,
"hello web site user, you probably want to see item Z!" and make the
text for item Z bold or bright red so it's easy for them to find it.
But over time, we can not only get these probability matrices fairly
accurate - because you have to acquire a bunch of data before they
become genuinely useful - but we can also collect the number of times
*anybody* clicked item Z or entered category Y.

Since we can collect those numbers, we can calculate base
probabilities for category Y and item Z. And since we know the
probability that user X enters category Y looking for item Z, when
somebody enters category Y looking for item Z, we'll be able to
calculate the probability that they're user X. And that becomes useful
if we know other things about user X - for instance, user X always
chooses FedEx for their shipping method, so if we calculate a high
probability that this user entering Y looking for Z is the user X we
already know about, then we go ahead and make FedEx the first option
in the list of shipping options, and put the link in bold text and
make it bright red just to make life easier for user X.

Basically, you know if you get coffee at the same place every time,
you don't have to tell them what you want? They see you come in the
door and they start making the one-shot 12oz. soy latte with cinammon
and they ring it up for you without you having to describe it in
detail every time? Bayes' theorem allows websites to do the same
thing, in some cases.

-- 
Giles Bowkett

Podcast: http://hollywoodgrit.blogspot.com
Blog: http://gilesbowkett.blogspot.com
Portfolio: http://www.gilesgoatboy.org
Tumblelog: http://giles.tumblr.com

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org

Reply via email to