2009/3/4 Matthew Toseland <[email protected]>:

> STORAGE:
> There are 6 stores: a store and a cache for each of SSK, CHK and public key
> (for SSKs; we can avoid sending this over the network in many cases). Every
> block fetched is stored in the relevant cache. When an insert completes, it
> will be stored only if there is no node with a location closer to the key's
> location form than this node. But this calculation ignores peers with a lower
> reported uptime than 40% (this is simply an estimate computed by the node and
> sent to us on connection), and ignores backed off nodes unless all nodes are
> backed off. It does not take into account recently failed status etc.
> Arguably we should send uptime messages more often than on connection: this
> disadvantages high uptime nodes that connected when they were first
> installed. In practice the store fills up *much* slower than the cache.

There is a technique that would make the store fill more quickly than
it currently does without any drawbacks (aside from a small amount of
development time ;) ).  Right now, there are two hash tables with one
slot per location.  One hash table is the store, one the cache.
(Obviously I'm only considering a single one of CHK/SSK/pubkey.)  This
is equivalent to a single hash table with two slots per location, with
rules that decide which slot to use rather than which hash table.

Currently, the rule is that inserts go in the store and fetches in the
cache.  The change is this: when storing a key in the cache, if the
cache slot is already occupied but the store slot is empty, put it in
the store instead (and vice versa).  Even without the bloom filter,
this doesn't add any disk reads -- by treating it as one hash table
with two slots per location, you put those two slots adjacent on disk
and simply make one larger read to retrieve both keys.

This is a technique I first saw in hash tables for chess programs.
Evaluation results are cached, with two distinct slots per location.
One slot stores the most recent evaluation result, the other stores
the most expensive to recompute.  There is a noticeable performance
improvement in the cache if you are willing to store a result in the
wrong slot when only one of the two is full already.

Evan Daniel
_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to