[freenet-dev] iTL (was Re: Where now?)

Martin Stone Davis Tue, 02 Dec 2003 00:03:49 -0800

Toad wrote:

On Sun, Nov 30, 2003 at 10:09:40AM -0800, Martin Stone Davis wrote:

Ian Clarke wrote:

Ok, the new stable build seems to be working quite well, are other people experiencing the same thing?

We need to take stock of the situation with NGR. I think one problem has been a willingness to dream up solutions, and implement them, before actually understanding what the problem is.

I would like to propose that now that the time-pressure is off, we try to be more cautious - we need to form theories about what the problem is, figure out how to test these theories, and if they prove true, *then* we implement a solution.
Well, okay, but how does that relate to the plan I outlined (http://article.gmane.org/gmane.network.freenet.devel/8191)?

I understand why you would want more proof that we should drop HTL in favor of time-to-live (Toad's idea, which I support). To do so *right now* would be a big change, since we still have many details to be filled in (http://article.gmane.org/gmane.network.freenet.devel/8184). Therefore we should only do it if we're confident that it will cure a major ill.

Unfortunately, we can't. Well, we can't easily. Because there will be a nonlinear relationship between pDNF and the time to live value.

Right. That's why I was having difficulty filling in the details on your original plan.

One possible solution is to use something more sophisticated than an estimator (some sort of machine learning algo maybe).

Well, it would still be an estimator. It would just be a more sophisticated estimator.

Another solution
would be to not have a variable timeout. Requests die when they go to
all *available* nodes (which is limited under load by unobtanium [ only
accepting requests we have a chance of completing ])...

Nice. That certainly would make calculating estimate() easier, since we wouldn't have to worry about accounting for time-to-live. Which, of course, is your point. The other benefit is that it removes an alchemical parameter.

How does a node decide when a request was killed? What if a node accepts a query and never returns a DNF or a QR? What if it just keeps saying "I'm working on it..."?

I think the best thing we could do is the following: Compute the estimatedSearchTimeForThisNode=estimatedSearchTime() as the amount of time we expect this node to tell us it has found the data. Then, once any node has not started sending a trailer for longer than QUERY_SEARCH_DEAD_FACTOR*estimatedSearchTimeForThisNode milliseconds, we start considering whether we should consider the query killed. To do *that*, we check other available nodes and compute estimatedSearchTime() for the best available node (still use estimate() to determine which node is best). When timeElapsed > QUERY_SEARCH_DEAD_FACTOR * MAX(estimatedSearchTimeForThisNode,estimatedSearchTimeForBestAvailableNode), we consider the query dead, and try the best available node. The only alchemy there is the configurable parameter, QUERY_DEAD_FACTOR. (I'd recommend we try QUERY_DEAD_FACTOR=2.)

In fact, we could use the same sort of philsosphy to kill transfers due to a too-low transfer rate. Once bytesReceived/timeElapsedSinceSearchSuccess < (1/OVERALL_TRANSFER_RATE_DEAD_FACTOR) * MAX(estimatedRawTransferRateForThisNode,estimatedEffectiveTransferRateForBestAvailableNode), we kill the transfer and go with the best available node. (The difference between a "raw" transfer rate and an "effective" transfer rate is that the "effective" transfer rate accounts for the additional time required to search for the key.)

And, instead of using the transfer rate since the beginning of the transfer, we might also consider just the RECENT transfer rate experience. That way we don't wait a long time while a node which transferred all but the last 1k of a 1MB key stalls forever. I guess that would involve two configurable parameters: the amount of time that is considered "recent", and the RECENT_TRANSFER_RATE_DEAD_FACTOR.

Given the above, doesn't it also mean that we can throw away failure tables??? Am I going nuts here? What would be the use of failure tables in such a system?

If I'm right about all that, then we have all the details needed for implementation and we now just need a name for it. Not hops-to-live, not time-to-live... how about indefinite-to-live, or iTL? Looks kinda Macintoshy. Hip.

I've re-convinced myself that we (you) should go ahead and implement iTL, ian's backoff, unobtanium, and the new estimate() all at once, ASAP after you've gotten 0.5.3 out the door.

Whatcha think?

-Martin


_______________________________________________
Devl mailing list
[EMAIL PROTECTED]
http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

[freenet-dev] iTL (was Re: Where now?)

Reply via email to