On Thursday 12 March 2009 00:55:35 Daniel Cheng wrote:
> On Thu, Mar 12, 2009 at 1:03 AM, Evan Daniel <evanbd at gmail.com> wrote:
> > 2009/3/4 Matthew Toseland <toad at amphibian.dyndns.org>:
> >
> >> STORAGE:
> >> There are 6 stores: a store and a cache for each of SSK, CHK and public 
key
> >> (for SSKs; we can avoid sending this over the network in many cases). 
Every
> >> block fetched is stored in the relevant cache. When an insert completes, 
it
> >> will be stored only if there is no node with a location closer to the 
key's
> >> location form than this node. But this calculation ignores peers with a 
lower
> >> reported uptime than 40% (this is simply an estimate computed by the node 
and
> >> sent to us on connection), and ignores backed off nodes unless all nodes 
are
> >> backed off. It does not take into account recently failed status etc.
> >> Arguably we should send uptime messages more often than on connection: 
this
> >> disadvantages high uptime nodes that connected when they were first
> >> installed. In practice the store fills up *much* slower than the cache.
> >
> > There is a technique that would make the store fill more quickly than
> > it currently does without any drawbacks (aside from a small amount of
> > development time ;) ). ?Right now, there are two hash tables with one
> > slot per location. ?One hash table is the store, one the cache.
> > (Obviously I'm only considering a single one of CHK/SSK/pubkey.) ?This
> > is equivalent to a single hash table with two slots per location, with
> > rules that decide which slot to use rather than which hash table.
> >
> > Currently, the rule is that inserts go in the store and fetches in the
> > cache. ?The change is this: when storing a key in the cache, if the
> > cache slot is already occupied but the store slot is empty, put it in
> > the store instead (and vice versa). ?Even without the bloom filter,
> > this doesn't add any disk reads -- by treating it as one hash table
> > with two slots per location, you put those two slots adjacent on disk
> > and simply make one larger read to retrieve both keys.
> >
> > This is a technique I first saw in hash tables for chess programs.
> > Evaluation results are cached, with two distinct slots per location.
> > One slot stores the most recent evaluation result, the other stores
> > the most expensive to recompute. ?There is a noticeable performance
> > improvement in the cache if you are willing to store a result in the
> > wrong slot when only one of the two is full already.
> 
> I think this is the right way to go.
> The 50%:50% cache/store size split cause some confusion on user
> (there were some discussion on frost months ago), and wasted the disk
> space allocated.

Two points:

1. There may be issues with the store fill rate. We need to discuss this with 
our theoreticians, Oskar has commented on it. It was his simulations that led 
to the current policy, but it may very well be that we need to change it.

2. If we have a unified store/cache, would it make sense to have a flag 
indicating whether the block came from the store or the cache? Combined with 
an accurate estimate of the total in each (the salted hash store has major 
bugs in this area atm sadly), we should be able to clobber the "right" slot, 
no?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20090331/780bcdae/attachment.pgp>

Reply via email to