Toad wrote:

On Mon, Dec 01, 2003 at 10:52:52PM -0800, Martin Stone Davis wrote:

Toad wrote:


On Sun, Nov 30, 2003 at 10:09:40AM -0800, Martin Stone Davis wrote:


Ian Clarke wrote:



Ok, the new stable build seems to be working quite well, are other people experiencing the same thing?

We need to take stock of the situation with NGR. I think one problem has been a willingness to dream up solutions, and implement them, before actually understanding what the problem is.

I would like to propose that now that the time-pressure is off, we try to be more cautious - we need to form theories about what the problem is, figure out how to test these theories, and if they prove true, *then* we implement a solution.

Well, okay, but how does that relate to the plan I outlined (http://article.gmane.org/gmane.network.freenet.devel/8191)?


I understand why you would want more proof that we should drop HTL in favor of time-to-live (Toad's idea, which I support). To do so *right now* would be a big change, since we still have many details to be filled in (http://article.gmane.org/gmane.network.freenet.devel/8184). Therefore we should only do it if we're confident that it will cure a major ill.


Unfortunately, we can't. Well, we can't easily. Because there will be a
nonlinear relationship between pDNF and the time to live value.

Right. That's why I was having difficulty filling in the details on your original plan.


However, there is a nonlinear relationship between pDNF and the HTL now.
We ignore it...
True.


One
possible solution is to use something more sophisticated than an
estimator (some sort of machine learning algo maybe).

Well, it would still be an estimator. It would just be a more sophisticated estimator.



Another solution
would be to not have a variable timeout. Requests die when they go to
all *available* nodes (which is limited under load by unobtanium [ only
accepting requests we have a chance of completing ])...

Nice. That certainly would make calculating estimate() easier, since we wouldn't have to worry about accounting for time-to-live. Which, of course, is your point. The other benefit is that it removes an alchemical parameter.


Definitely.

How does a node decide when a request was killed? What if a node accepts a query and never returns a DNF or a QR? What if it just keeps saying "I'm working on it..."?


We have timeouts... not an overall request timeout, a timeout for a
response, within the request. For example, if we don't get an Accepted
within 1 hop time, we timeout and use another node, in the current code.
Okay, I didn't know that.

All that is already written, but many parts would have to be rewritten
due to no HTL.

Doesn't the below QUERY_DEAD_FACTOR stuff amount to a sensible rewrite of it then?



I think the best thing we could do is the following: Compute the estimatedSearchTimeForThisNode=estimatedSearchTime() as the amount of time we expect this node to tell us it has found the data. Then, once any node has not started sending a trailer for longer than QUERY_SEARCH_DEAD_FACTOR*estimatedSearchTimeForThisNode milliseconds, we start considering whether we should consider the query killed. To do *that*, we check other available nodes and compute estimatedSearchTime() for the best available node (still use estimate() to determine which node is best). When timeElapsed > QUERY_SEARCH_DEAD_FACTOR * MAX(estimatedSearchTimeForThisNode,estimatedSearchTimeForBestAvailableNode), we consider the query dead, and try the best available node. The only alchemy there is the configurable parameter, QUERY_DEAD_FACTOR. (I'd recommend we try QUERY_DEAD_FACTOR=2.)


Hmmm.

In fact, we could use the same sort of philsosphy to kill transfers due to a too-low transfer rate. Once bytesReceived/timeElapsedSinceSearchSuccess < (1/OVERALL_TRANSFER_RATE_DEAD_FACTOR) * MAX(estimatedRawTransferRateForThisNode,estimatedEffectiveTransferRateForBestAvailableNode), we kill the transfer and go with the best available node. (The difference between a "raw" transfer rate and an "effective" transfer rate is that the "effective" transfer rate accounts for the additional time required to search for the key.)


I don't like killing transfers. Unless we have a really good reason.

The really good reason in this case is that a node which promised us lots and lots of sex is only givin' it up when she's on her period now. And *also* because there's this hot chick at the office who's promising to do it with us. Gawd I love those sex metaphors.



Besides, why didn't we go with the best available node in the first place?!

The best available node is the best available one RIGHT NOW. That changes with time due to backoff. If the node we are currenly querying/transferring with goes south and there's another another node available which we think will do _much_ better than the current one *actually is* doing, then we figure it's to our advantage to ditch the current one and go with the best available. How much is "much" is the configurable factor.



And, instead of using the transfer rate since the beginning of the transfer, we might also consider just the RECENT transfer rate experience. That way we don't wait a long time while a node which transferred all but the last 1k of a 1MB key stalls forever. I guess that would involve two configurable parameters: the amount of time that is considered "recent", and the RECENT_TRANSFER_RATE_DEAD_FACTOR.

Given the above, doesn't it also mean that we can throw away failure tables??? Am I going nuts here? What would be the use of failure tables in such a system?


2 things:
1. Keys that Frost polls, for example, need to not take up a full
search's resources. Hence we DNF when we get a recently DNFed key. We
could still do this, with the new meaning of DNF.
2. We don't want such keys to influence routing stats and estimators.
This is the secondary failure table, discussed a while ago on this list.

Okay, I see now. But at least we won't need to create failure tables for each individual node, right?

If I'm right about all that, then we have all the details needed for implementation and we now just need a name for it. Not hops-to-live, not time-to-live... how about indefinite-to-live, or iTL? Looks kinda Macintoshy. Hip.


How about Flood? :)
hmm... might be bad for PR. ;)


I've re-convinced myself that we (you) should go ahead and implement iTL, ian's backoff, unobtanium, and the new estimate() all at once, ASAP after you've gotten 0.5.3 out the door.

Whatcha think?

-Martin


-Martin


_______________________________________________ Devl mailing list [EMAIL PROTECTED] http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to