On May 1, 2009, at 9:35 AM, Matthew Toseland wrote:
IMPLEMENTING IT:
Main tasks:
- Converting our datastore indexes into a compact, slightly more
lossy, 1-bit
filter. (Easy)
- Creating a snapshot once an hour, and keeping for some period (1
week?).
(Easy)
- Creating an efficient binary diff format for hour-to-hour diffs,
and keeping
them. (Moderate)
This might actually be quite hard as an efficient bloom filter will
scatter (even a few) updates all over the filter.
- Sending our filters to our darknet peers. (Easy)
- Updating our darknet peers once an hour with the diffs we create
anyway.
(Easy)
- Recording what version of the indexes we have sent to each darknet
peer.
(Easy)
- Updating our darknet peers when they reconnect after downtime with
all the
diffs they missed. (Easy)
IMHO this revision/diff/update system is too exacting a mechanism.
If this is to be used only for failing requests, perhaps a quicker-to-
implement & lossy mechanism should be considered.
i.e. a sparse (and decaying?) bloom filter
--
initialize to empty (of course)
choose an update period (1 week you mentioned?)
send out 'rolling' updates of our bloom filter
*i.e. start at 'chunk' 0, then 1, then 2
*at such a rate that in one period it will be fully transmitted (1/nth
chunk every 1/period)
whenever we get an update from our peer it overwrites that chunk of
the bloom filter
*if we don't get an update from our peer, zero out that chunk
*if we have been offline, zero out the chunks we missed
--
Still must decide if going to listen to updates (i.e. keeping huge
bloom filters)
Transmits are WAY easier
Receives are easy
After ~ one period it becomes usable (starts producing hits)
The downside is, the more a peer is down, the less likely we are to
have a usable filter... That is, unless a peer might be able to
request a chunk of the bloom filter that he missed/wants (!!?!)]
Or even... just make it a pull-mechanism!!!! those peers interested
in our bloom filters can request any part of it; but it might cost
them request-capital.
- Using any filters we have to send GetOfferedKey-like requests for
the data,
and handling such requests. (Easy)
This is the core of the enhancement, of course; but may be easier by
just requesting the data (making it like a turtled request) rather
than a new message.
- Track success rates for bloom-based fetches. (Optional, can be
done later)
(Easy)
- Tracking a large number of opennet peers which we have connected
to over the
last week, recording their total uptime over the week. (Moderate)
- Calculating which have sufficient uptime for sharing bloom filters
to be
worthwhile. (Easy)
- Sending those nodes the bloom filters, and keeping them up to
date, and
recording what version we sent them last. (Easy)
- Limit the total memory usage for Bloom filters, by adding it up
and then
dropping the biggest filters from active use until we're not over
the limit.
Tell our peers so they don't send us updates. (Moderate)
_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl