On Sat, May 26, 2001 at 10:00:05AM -0700, Ian Clarke wrote:
<>
> 5.1 Reservations
> 
> One thing that worries me about this definition of closeness is that it is
> entirely dependent on what keys a node currently holds. An outside observer
> with knowledge of only a subset of keys on the node would not be able to
> predict the closeness of two keys calculated by that node. It also means
> that as each node prunes and updates its node routing list it will change
> the behaviour of future routing decisions in unpredictable ways.
> 
> This may or may not affect routing, but it certainly does affect our ability
> to predict how the distributed network will behave.

The distance used for the "closeness" is the numerical distance between
the keys, which is the number of leaves between then in the tree if the
tree were full, not the number of leaves in the current tree.

We do want it to be difficult for outsiders to predict how nodes will
behave though.

> 6. Potential problems
> 
> The model mentioned in section 2 suggests a potential problem. In order to
> be a good self-organising network that routes requests to nodes containing
> their associated data, it is necessary for nodes in the vicinity of the
> insertion point to have many requests routed through them. In the current
> implementation, caching is very aggressive: a node caches data the first
> time it sees it and declines to request that data again if a request arrives
> for it.
> 
> This has the profound effect of moving the data away from the point of
> insertion. ie. It starves the routing mechanism of data. Cached files are so
> preferred by the network that after one or two hits, the original data is
> never fetched again. After a while the network forgets how to route to that
> file, simply because it doesn't have to. Later, because of the incredibly
> high level of thrashing in the caches, all the caches drop the file, and the
> file is essentially lost.
> 
> In short, caching may be the cancer killing freenet.

I don't know about this. I have more considered it an issue of nodes
loosing data they should be keeping because they are caching data that
matters little to them. Considering that a situation with too little data
on Freenet can also cause the routing too break down is an interesting
idea though.

> 6.1 Further exacerbation
> 
> Furthermore, the cache shares space with inserted files - a thrashing cache
> guarantees the loss of all data legitimately inserted into a node.

The way this is put lends too much confidence in the precision of the
insert. That a file was cached from a request does naturally means it
"belongs" on a node any less then if it was cached from the insert.

<>
> 7. Suggestions
> 
> There are unfortunate consequences to fiddling with the current
> implementation of freenet - people are starting to use it, and thus it is
> difficult to make radical changes to its behaviour in the interests of
> experimentation.

Not really.

> 7.1 Disable caching
> 
> A draconian experiment that would be extremely useful would be to simply
> disable all caching in future versions of the freenet client. This would let
> us determine the following:

I can promise that this will not work at all.

> a) Does the routing algorithm work? Simulations have been done, but do they
> match reality well enough?

All simulations used caching.

> b) How well does it work?

Not well enough.

> c) How can we make it better? We could test some questions, like:
>         i)   Does it help to make the search algorithm stochoastic?

It already is in more ways then I can count.

>         ii)  Does it help for data to be inserted at more than one point?
>
>         iii) Should we use a more absolute key comparison method such
>              as hamming distances or log(|knownkey-searchkey|)?

Log is a strictly increasing function, there is no point in taking the log
of something if you are going to compare for size anyways.

> 7.2 Limit caching
> 
> Another option is to severely curtail the caching without completely
> disabling it. 
> 
> 7.2.1 Probabilistic caching
> 
> A node could have two parameters. The first parameter (call it C) would
> specify the probability of the node caching a file. The second parameter
> (call it H) is the probability that a request for the file would produce a
> "hit" on the cached data.

The H value is a passthrough - Ian doesn't like those.

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar at freenetproject.org

_______________________________________________
Devl mailing list
Devl at freenetproject.org
http://lists.freenetproject.org/mailman/listinfo/devl

Reply via email to