On Saturday 02 May 2009 01:23:23 Evan Daniel wrote: > On Fri, May 1, 2009 at 7:59 PM, Matthew Toseland > <toad at amphibian.dyndns.org> wrote: > > On Saturday 02 May 2009 00:53:27 Evan Daniel wrote: > >> On Fri, May 1, 2009 at 7:01 PM, Matthew Toseland > >> <toad at amphibian.dyndns.org> wrote: > >> > On Friday 01 May 2009 22:43:50 Robert Hailey wrote: > >> >> > >> >> On May 1, 2009, at 3:46 PM, Evan Daniel wrote: > >> >> > >> >> > Yes, that's a question worth considering. ?There are both performance > >> >> > and security issues involved, I think. ?Note that the partition could > >> >> > be a set of contiguous regions (allowing performance optimization > >> >> > around which piece of the keyspace you send info about), but it could > >> >> > just as easily be determined by a hash function instead. > >> >> > > >> >> > You still check the same number of filters overall -- one per peer. > >> >> > The difference is that for some peers you may have a partial filter > >> >> > set, and therefore sometimes check their filters, instead of deciding > >> >> > you don't have the memory for that peer's filter and never checking > >> >> > it. > >> >> > >> >> Maybe if we partition it we can also get a free datastore histogram on > >> >> the stats page. > >> > > >> > No, we cannot divide by actual keyspace, the keys must be hashed first, or > > the > >> > middle bloom filter will be far too big. > >> > >> Well, you could partition by actual keyspace as long as the partitions > >> are (approximately) equal in population rather than fraction of the > >> keyspace they cover. ?Doing that is only mildly nontrivial, and gives > >> you a histogram with variable-width bars, but still an accurate > >> histogram. ?Each bar would cover the same area; tall and skinny near > >> the node's location, wide and short away from it. > > > > And if the distribution was ever to change ... it's not mildly nontrivial > > IMHO. > > > > Anyway, the first version should simply use the existing filters, to make > > things easy. > > You have to recompute the bloom filter occasionally regardless, > otherwise the counters eventually all saturate. If the distribution > changes enough to be problematic, you rebalance the filters.
Fair point. > > If you're willing to be slightly inefficient, you can avoid > recomputing all the filters. When one filter starts getting full, you > split its portion of the keyspace in half and create a pair to take > its place. On average your filters will only be 3/4 full or so, but > you reduce the computational load (though probably not the disk load? > hmm...) Maybe. > > Or you could just partition the keyspace by hashing and trust that > equal size partitions will have equal populations. This may be better and is certainly easier. > > Evan Daniel -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20090504/800efeed/attachment.pgp>
