Ian Clarke wrote:

Jonathan Howard wrote:

That shows what I say first and is dependent on (for working) as desired
tSuccess < tFailure.

(pSuccess + pFailure) = 1
tSuccess ~= tFailure  ->


Actually no, tFailure should typically be much larger than tSuccess as it includes the estimated time to rerequest the data from somewhere else (along with the time to for the failure to occur).

I can't resist :-)
Who is your money on the tortoise or the hare.


My money is on whichever is probabilistically going to result in some data being returned soonest (which includes the time required for the dog to fetch it if the hare comes back empty handed).

Mine too. We want to maximize the amount of data returned within a fixed amount of time. Unfortunately, StandardNodeEstimator.estimate() does not recognize the advantage of a consistently good but slow result over a consistently bad but fast result. Below I derive how to decide which "dog" to place your bets on. That is, how to calculate expected number of successes per unit of time.


Let's say that tSuccess is the number of time units it takes to return 1 data unit if successful. Imagine some long stretch of time t where we are repeatedly making requests of the node. That time would be filled by a bunch of intervals. Say, nSuccess of them have length tSuccess, and nFailure have length tFailure (of course, it's more random than that, but I'm trying to make it simple). Then,

t = nSuccess*tSuccess + nFailure*tFailure

We would expect the ratio of the number of success over failure intervals to be pSuccess/pFailure. So we can write

nSuccess/nFailure = pSuccess/pFailure

We want to get nSuccess/t, the number of successes per time unit. Combining the above two equations, we can derive:

nSuccess/t = 1/(tSuccess + tFailure*pFailure/pSuccess)

Let's take Jonathan's example:
>Node 1 - pSuccess 0.1 tSuccess 1000 pFailure 0.9 tFailure 2000
>estimate = 1900
>
>Node 2 - pSuccess 0.9 tSuccess 5000 pFailure 0.1 tFailure 10000
>estimate = 5500

For node 1, nSuccess/t = 1/(1000+0.9*2000/0.1)  = 1/19000
For node 2, nSuccess/t = 1/(5000+0.1*10000/0.9) = 1/6111

So, node 2 really is better than node 1 since it returns more data per unit of time.

We might change estimate() to calculate nSuccess/t instead of pSuccess*tSuccess+pFailure*tFailure, but that would mean opting for nodes with higher estimates rather than nodes with lower estimates. So to make programming easier, calculate the reciprocal:

t/nSuccess = tSuccess + tFailure*pFailure/pSuccess
           = (pSuccess*tSuccess + tFailure*pFailure)/pSuccess
           = current_estimate() / pSuccess

Ah ha! The current calculation of estimate only needs a little modification: divide it by pSuccess.



-Martin

P.S. Jonathan, you were half right when you said to change:

>174: estimate += pSuccess * tSuccess;
>to
>estimate += tSuccess;

since you also have to change estimate += pFailure*tFailure to estimate += pFailure*tFailure/pSuccess.


_______________________________________________ Devl mailing list [EMAIL PROTECTED] http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to