On Thursday 24 July 2003 09:17 pm, Scott Young wrote:
> On Thu, 2003-07-24 at 19:41, Ian Clarke wrote:
> > Good question, although it is hard to think of an improvement upon the
> > current approach, which is to assume that the least-recently-accessed
> > key is the key least likely to be requested again.
> >
> > Ian.
>
> Analyzing how well the data-removal algorithm works would require a
> little information stored about each deleted key (such as time of
> deletion and the key). The parameter to be minimized is the probability
> of a deleted key being requested again. Some potential algorithms for
> this could correlate the routing algorithm's time-estimates with the
> keys in the DataStore. For example, if the minimum time for requesting
> a key is high, requests for that part of the keyspace might possibly be
> less frequent, and therefore it might be better to delete from that part
> of the keyspace. It would be up to the algorithm to learn such
> correlations ('cause i'm too lazy to analyze the data myself and update
> alchemy variables accordingly).
>
> Another possible correlation might be in file size. High bandwidth
> nodes would be good at transferring large files, but low bandwidth low
> latency nodes would be like express lanes for small files. With the new
> CHK specification, the file size could be used as a variable to learn
> with in the routing and data deletion algorithms.
The problem with this is that it does not really solve the underlying problem.
On the one hand, you could have each node cache the N most popular requests,
which would be good for high bandwidth nodes, and provide large redundancy
but extremely small capacity. On the other hand, you could have each node
cache the N requests that it preseves to be closest to it's degree of
specialization, which would be good for low bandwith nodes, and provide large
specialization and network capacity, but lower redundancy and speed.
Ether way could work, but the optimal solution is somewhere in between. The
question is how do you DEFINE the optimal network? Each node could balance
these two extreams to meet it's current needs, but what do you optimize for?
If you optimize for local near term routing speed, you'll always get the
first configuration. We currently use a version of the first system with lots
of built in fudge factors. But what is the BEST solution? When does
redundancy become more important than capacity?
_______________________________________________
devl mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl