On Tue, Dec 02, 2003 at 03:17:22PM -0800, Martin Stone Davis wrote:
Toad wrote:
Another suggestion, from Toast: Estimate the latest requests rank as compared to the last n requests (100 for example). Compare the position in this list with the current load (using Ians method or the standard method). Decide on that. The number 100 is not alchemy as it affects precision of routing, not routing it's self.
We keep an LRU list of the last N (say 100) requests, and their estimates. This is kept as 2 doubly linked lists, one for the LRU, another kept in order of the estimates.
When we get a request: Calculate our own estimate. Synchronize on the lists' sync object Find the LRU request, and remove it from both lists Walk the list sorted by estimates, and insert ourself in the appropriate place. Remember our position numerically - the number of nodes before us in the list. Insert ourselves on the new end of the LRU. Let X = target fraction accepted. If our rank on the list was worse than X * N, reject the request, otherwise accept it.
Toast's idea is good. I have one adjustment. The values in the list we look through must be the best estimates *of the nodes which are currently not backed off*.
Good point. We would use the best estimate in the RT of the nodes not backed off. One interesting question is what to do about file sizes - the obvious thing would be to estimate on the basis of a standard size rather than the actual size in the key. If we use the actual size from the key, that would exert a massive selective pressure against large keys, which of course take a lot longer to transfer.
So use estimate()/sizeOfKey? That makes sense.
For example, say we have one "fast" node on our RT whose estimate() is very small for all keys, whereas the rest of the nodes yield much higher estimate()s. Suppose the last N requests were all sent to the fast node. We then find a QR from that node and consider it backed off. Now, another query comes in. Using the technique Toast and Toad described, we would almost certainly reject the request since its estimate() would be low. But that's only because the remaining nodes in our routing table are slow. We should instead re-evaluate the estimate()s of the last N requests based on the currently available nodes.
Hmm. That's more tricky. You think we should store all the estimates for each request? That would use significantly more memory and CPU... but it needn't do a full estimate for each...
Here's two ideas for how to reduce the required memory and/or CPU usage:
1. Keep track of only the 1st and 2nd best nodes for each query.
Then, each time we start to process a query, check to see if the list of nodes which are backed off has changed. If exactly the same nodes are available, then don't bother recomputing estimates. If any node has backed off or become available since the query processed, then recompute the estimates of all n nodes in the list.
When we do this, check to see if the node that was originally considered "best" is still availble. If so, we only have to consider that node and the nodes that are no longer backed off. If the node that was originally considered best, check the one that was considered 2nd best node. If that is still available available, then again, we only have to consider that node and the nodes that are no longer backed off. If neither is available, then we have to do a full recomputation of estimate() for all available nodes.
2. Stop computing estimate() after we have exceeded the values on both the best (so far) and 2nd best available nodes. To do this, we could add an argument to estimate(). Then, just return as soon as the value is exceeded. When we don't want this behavior, set the argument to MAX_LONG.
-Martin
_______________________________________________ Devl mailing list [EMAIL PROTECTED] http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl
