[freenet-dev] Can we implement Bloom filter sharing quickly???

Matthew Toseland Mon, 4 May 2009 16:35:43 +0100

On Saturday 02 May 2009 01:23:23 Evan Daniel wrote:
> On Fri, May 1, 2009 at 7:59 PM, Matthew Toseland
> <toad at amphibian.dyndns.org> wrote:
> > On Saturday 02 May 2009 00:53:27 Evan Daniel wrote:
> >> On Fri, May 1, 2009 at 7:01 PM, Matthew Toseland
> >> <toad at amphibian.dyndns.org> wrote:
> >> > On Friday 01 May 2009 22:43:50 Robert Hailey wrote:
> >> >>
> >> >> On May 1, 2009, at 3:46 PM, Evan Daniel wrote:
> >> >>
> >> >> > Yes, that's a question worth considering. ?There are both 
performance
> >> >> > and security issues involved, I think. ?Note that the partition 
could
> >> >> > be a set of contiguous regions (allowing performance optimization
> >> >> > around which piece of the keyspace you send info about), but it 
could
> >> >> > just as easily be determined by a hash function instead.
> >> >> >
> >> >> > You still check the same number of filters overall -- one per peer.
> >> >> > The difference is that for some peers you may have a partial filter
> >> >> > set, and therefore sometimes check their filters, instead of 
deciding
> >> >> > you don't have the memory for that peer's filter and never checking
> >> >> > it.
> >> >>
> >> >> Maybe if we partition it we can also get a free datastore histogram on
> >> >> the stats page.
> >> >
> >> > No, we cannot divide by actual keyspace, the keys must be hashed first, 
or
> > the
> >> > middle bloom filter will be far too big.
> >>
> >> Well, you could partition by actual keyspace as long as the partitions
> >> are (approximately) equal in population rather than fraction of the
> >> keyspace they cover. ?Doing that is only mildly nontrivial, and gives
> >> you a histogram with variable-width bars, but still an accurate
> >> histogram. ?Each bar would cover the same area; tall and skinny near
> >> the node's location, wide and short away from it.
> >
> > And if the distribution was ever to change ... it's not mildly nontrivial
> > IMHO.
> >
> > Anyway, the first version should simply use the existing filters, to make
> > things easy.
> 
> You have to recompute the bloom filter occasionally regardless,
> otherwise the counters eventually all saturate.  If the distribution
> changes enough to be problematic, you rebalance the filters.


Fair point.
> 
> If you're willing to be slightly inefficient, you can avoid
> recomputing all the filters.  When one filter starts getting full, you
> split its portion of the keyspace in half and create a pair to take
> its place.  On average your filters will only be 3/4 full or so, but
> you reduce the computational load (though probably not the disk load?
> hmm...)

Maybe.
> 
> Or you could just partition the keyspace by hashing and trust that
> equal size partitions will have equal populations.

This may be better and is certainly easier.
> 
> Evan Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20090504/800efeed/attachment.pgp>

[freenet-dev] Can we implement Bloom filter sharing quickly???

Reply via email to