On Fri, Sep 26, 2014 at 2:32 PM, Stanley Nilsen via AGI <[email protected]> wrote: > 4. probabilities > Matt states a fairly simple equation (utility * probability < cost) that > he calls the answer. I'm not sure it's as simple as indicated - in fact I'm > sure it is not.
It is not simple because estimating probabilities is not simple. Suppose you do an experiment some number of times and observe the following outcomes, where 1 = success and 0 = failure. For each of the following sequences, estimate the probability of success on the next trial: 1. 0110100101 2. 0000011111 3. 000 4. 0 In case 1 we observe 5 successes in 10 trials, so we guess p = 0.5. In case 2 we again observe 5 successes in 10 trials, but we are no longer justified in assuming that the trials are independent. We would assume p > 0.5 on the basis of a universal distribution, i.e. the shortest program to generate the data is the most likely. For example, output 5 zeros followed by ones. Cases 3 and 4 are examples of the zero frequency problem. This has received a lot of study in predictive modeling for data compression. Simply counting zeros and ones is wrong because probabilities are never exactly 0. A LaPlace estimator adds 1 to each count so for case 3, instead of p = 0/3 we have p = (0+1)/(3+2) = 0.2. This is theoretically optimal if all probabilities are equally likely. But in practice we often find that this offset is too high. For context models trained on text, we find experimentally that offsets of 0.03 to 0.05 are better estimators, e.g. p = (0+0.03)/(3+0.06) = 0.01. You often have to find these values experimentally. -- -- Matt Mahoney, [email protected] ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
