There's a problem with the current method of calculating the estimators of binary variables using exponential decay. The problem is this: If a binary variable has a true average value near 0 or 1, the exponential decay function will frequently compute the estimate WAY off from its "true" value.

For example, suppose that the pTransferFailed for a particular node is "truly" 0.01. That is, it fails 1 in 100 tries and it acts like a true random variable, s.t. it doesn't normally do things like succeeding for 990 and then failing for 10 in a row. (Given that we have some kind of working backoff, I would think this a reasonable assumption for the variables used in estimate(). It is not a good assumption for variables like pConnectionSuccess, whose "true" values change drastically depending on whether the person is actually connected.)

Continuing the example, with decayFactor=0.2 suppose the DecayingRunningAverage (DRA) happens to actually be 0.01 at some point. If we get a failure, the DRA will suddenly shoot to 0.208. If we then get 99 successes after that, the DRA will drop to 0.208*0.8^99 = 5.3e-11. If we then get another failure, the average shoot back up to 0.2. It will almost never be close to the actual value.

There's a real problem here because estimate() depends on the accuracy of such estimators. If the time cost of a transfer failure (for example) is significant, then we must properly account for it even when the probability of failure is low. As it is, we might be counting the cost of a transfer failure as 20x greater than we should, or else as practically zero.

What might be the solution to this? I think the problem lies in the fact that the variable being measured has a Bernoulli distribution, rather something "nicer" like a normal distribution. The most obvious and simple solution I can think of is to let decayFactor depend on the DRA for binary variables.

Let DDRA = the decaying decaying running average
Let decayingDecayFactor=decayFactor * MIN(DRA,1-DRA)
Then compute the new decaying decaying running average as:
DDRA := DDRA*(1-decayingDecayFactor) + [1 or 0]*decayingDecayFactor

Let's see how this improves the situation with pTransferFailed described above. Starting at DDRA=0.01, the decayingDecayFactor=0.002. If we get a failure, DDRA will rise to 0.012. If we then get 99 successes after that, then DDRA will drop to about 0.012*0.998^99 = 0.00984, or approximately 0.01, which is what we want. (Note that I didn't take into account the fact that decayingDecayFactor will change during this time, but that is minor.)

It might be objected that this change is too slow. Suppose the "true" pTransferFailed is 0.5, but we've somehow managed to get ourselves stuck with DDRA=0.01. How long will it take for us to achieve a value near DDRA=0.5? I computed this using Excel and assuming alternating success/failure: the answer is that it reaches 0.45 after 65 trials. Just one more success and it's at 0.50. That seems acceptable to me, especially since we don't expect to get ourselves stuck with a DDRA so far from true.

Whatcha think?

-Martin


_______________________________________________ Devl mailing list [EMAIL PROTECTED] http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to