On 2/17/08, Nicholas Thompson <[EMAIL PROTECTED]> wrote: > Robert, > > Thanks for these comments. Are you actually a person who could make me > understand Bayes intuitively, a little bit? > You could have free coffee from me anytime you wanted to try that.
Here's a basic rundown of Bayes. I am not an expert; this is more or less as much as I know. I can't collect the free coffee as I'm in Los Angeles right now, but maybe it'll help. Bayes theorem goes like this: p(a ^ b) = (p(b ^ a) * p(a)) / p(b) where ^ means "given." So the probability of A, given B, is equal to (the probability of A, given B, times the base probability of A itself) divided by the base probability of B itself. > The question was, given a panzy, what is the probability of [panzy-blooming > April 1 in Santa Fe]. So the data could be faulted in two different ways. > Tree-hugger Jones could know know what a panzy is, and report the blooming > of a "forget-me-not" on April first; or TJ could he could have the date > wrong. Or he could report his geographic coordinates wrong. The hardest > of these is the plant identification part, I would think. So I don't know if you could actually model this in a Bayesian way. You are basically modelling cause and effect with Bayes. This problem with the flower blooming at a particular time in a particular place is just a combination of probabilities. A nice canoncial Bayes example is, given that the grass is wet, what is the probability that it rained last night? a = rained last night b = grass wet p(a ^ b) = probability that it rained last night, given that the grass is wet p(b ^ a) = probability that the grass is wet, given that it rained last night (100%) p(a) = base probability of it raining last night p(b) = base probability of grass being wet p(b) will reflect both times when the grass was wet because it rained and times when the grass was wet because the automatic sprinklers turned on, or the kids were throwing water balloons at each other. p(a) can be high or low depending on the time of year. But AFAIK you do need to initially collect some data on the general probability of the grass being wet, given that it rained last night, to solve the equation at all. That's why this is a canonical example; it's easy to see that p(b ^ a) will be about 100%, because lawns generally don't dry out until the sun comes up. So to predict the probability of a particular flower blooming in a particular place at a particular time, that's calculating the probability of a coincidence, whereas Bayes is really all about cause and effect and pattern recognition, or inference - when X happens, Y often happens too, so since I know Y obviously happened here, can I say that X must have happened also? It's basically an equation that can do simple kinds of detective work. For example, I'm working on something I can't necessarily describe in too much detail, but it's a Web application which creates probability matrices, such that it will know that if User X is in Category Y, they probably want to look at item Z. That's cool because we can say, "hello web site user, you probably want to see item Z!" and make the text for item Z bold or bright red so it's easy for them to find it. But over time, we can not only get these probability matrices fairly accurate - because you have to acquire a bunch of data before they become genuinely useful - but we can also collect the number of times *anybody* clicked item Z or entered category Y. Since we can collect those numbers, we can calculate base probabilities for category Y and item Z. And since we know the probability that user X enters category Y looking for item Z, when somebody enters category Y looking for item Z, we'll be able to calculate the probability that they're user X. And that becomes useful if we know other things about user X - for instance, user X always chooses FedEx for their shipping method, so if we calculate a high probability that this user entering Y looking for Z is the user X we already know about, then we go ahead and make FedEx the first option in the list of shipping options, and put the link in bold text and make it bright red just to make life easier for user X. Basically, you know if you get coffee at the same place every time, you don't have to tell them what you want? They see you come in the door and they start making the one-shot 12oz. soy latte with cinammon and they ring it up for you without you having to describe it in detail every time? Bayes' theorem allows websites to do the same thing, in some cases. -- Giles Bowkett Podcast: http://hollywoodgrit.blogspot.com Blog: http://gilesbowkett.blogspot.com Portfolio: http://www.gilesgoatboy.org Tumblelog: http://giles.tumblr.com ============================================================ FRIAM Applied Complexity Group listserv Meets Fridays 9a-11:30 at cafe at St. John's College lectures, archives, unsubscribe, maps at http://www.friam.org
