On Fri, Sep 26, 2014 at 2:32 PM, Stanley Nilsen via AGI <[email protected]> 
wrote:
> 4. probabilities
>     Matt states a fairly simple equation (utility * probability < cost) that
> he calls the answer.  I'm not sure it's as simple as indicated - in fact I'm
> sure it is not.

It is not simple because estimating probabilities is not simple.
Suppose you do an experiment some number of times and observe the
following outcomes, where 1 = success and 0 = failure. For each of the
following sequences, estimate the probability of success on the next
trial:

1. 0110100101
2. 0000011111
3. 000
4. 0

In case 1 we observe 5 successes in 10 trials, so we guess p = 0.5.

In case 2 we again observe 5 successes in 10 trials, but we are no
longer justified in assuming that the trials are independent. We would
assume p > 0.5 on the basis of a universal distribution, i.e. the
shortest program to generate the data is the most likely. For example,
output 5 zeros followed by ones.

Cases 3 and 4 are examples of the zero frequency problem. This has
received a lot of study in predictive modeling for data compression.
Simply counting zeros and ones is wrong because probabilities are
never exactly 0. A LaPlace estimator adds 1 to each count so for case
3, instead of p = 0/3 we have p = (0+1)/(3+2) = 0.2. This is
theoretically optimal if all probabilities are equally likely. But in
practice we often find that this offset is too high. For context
models trained on text, we find experimentally that offsets of 0.03 to
0.05 are better estimators, e.g. p = (0+0.03)/(3+0.06) = 0.01. You
often have to find these values experimentally.

-- 
-- Matt Mahoney, [email protected]


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to