2009/3/4 Matthew Toseland <[email protected]>: > STORAGE: > There are 6 stores: a store and a cache for each of SSK, CHK and public key > (for SSKs; we can avoid sending this over the network in many cases). Every > block fetched is stored in the relevant cache. When an insert completes, it > will be stored only if there is no node with a location closer to the key's > location form than this node. But this calculation ignores peers with a lower > reported uptime than 40% (this is simply an estimate computed by the node and > sent to us on connection), and ignores backed off nodes unless all nodes are > backed off. It does not take into account recently failed status etc. > Arguably we should send uptime messages more often than on connection: this > disadvantages high uptime nodes that connected when they were first > installed. In practice the store fills up *much* slower than the cache.
There is a technique that would make the store fill more quickly than it currently does without any drawbacks (aside from a small amount of development time ;) ). Right now, there are two hash tables with one slot per location. One hash table is the store, one the cache. (Obviously I'm only considering a single one of CHK/SSK/pubkey.) This is equivalent to a single hash table with two slots per location, with rules that decide which slot to use rather than which hash table. Currently, the rule is that inserts go in the store and fetches in the cache. The change is this: when storing a key in the cache, if the cache slot is already occupied but the store slot is empty, put it in the store instead (and vice versa). Even without the bloom filter, this doesn't add any disk reads -- by treating it as one hash table with two slots per location, you put those two slots adjacent on disk and simply make one larger read to retrieve both keys. This is a technique I first saw in hash tables for chess programs. Evaluation results are cached, with two distinct slots per location. One slot stores the most recent evaluation result, the other stores the most expensive to recompute. There is a noticeable performance improvement in the cache if you are willing to store a result in the wrong slot when only one of the two is full already. Evan Daniel _______________________________________________ Devl mailing list [email protected] http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
