[freenet-dev] Can we implement Bloom filter sharing quickly???

Matthew Toseland Mon, 29 Jun 2009 16:13:01 +0100

On Friday 19 June 2009 18:50:04 Evan Daniel wrote:
> On Fri, Jun 19, 2009 at 1:36 PM, Matthew
> Toseland<toad at amphibian.dyndns.org> wrote:
> > On Friday 19 June 2009 17:37:00 Robert Hailey wrote:
> >>
> >> On Jun 18, 2009, at 9:07 PM, Matthew Toseland wrote:
> >>
> >> > CHUNK SIZE PROBLEM:
> >> > ===================
> >> >
> >> > Current plans call to split the keyspace up for each datastore, and
> >> > assign keys to a manageable sized bloom filter for a section of the
> >> > (hashed) keyspace. These can then be transferred separately, are
> >> > useful immediately after being transferred, can be juggled in
> >> > memory, etc. However, we cannot guarantee that the populaion of such
> >> > a chunk will be small enough for the filter to be effective. Solving
> >> > this requires either moving the boundaries dynamically (possibly
> >> > continually), or making the chunks significantly larger than they
> >> > would need to be in an ideal world.
> >>
> >> Right... but I believe we prescribed that the split would be based on
> >> a hashed value of the key and not by logical keyspace location to
> >> avoid disproportionate chunks.
> >>
> >> That is to say... ideally a node is going to get a disporportionate
> >> amount of cache/store data about it's network location.
> >>
> >> STORE_SIZE/MAX_BLOOM_CHUNK -> N_CHUNKS
> >> H(key, N_CHUNKS) => n & (0 < n < N)
> >> CHUNK[n].add(key)
> >>
> >> Wouldn't the problem be reduced to finding a well-scattering hash
> >> function then?
> >
> > Yes and no. How much variation will we have even if we divide by hashed 
> > keyspace? Hence how much bigger than the ideal splitting size do we need 
> > each chunk to be to maintain approximately the right false positives ratio?
> 
> If the hash spreads things well, number of keys in a single bloom
> filter should be normally distributed with mean ((total keys) /
> (number of filters)) and standard deviation sqrt((total keys) * (1 /
> (number of filters)) * (1 - 1 / (number of filters))).  (It's a
> binomial distribution; any given key has a 1/ number of filters chance
> of landing in a specific filter.)  For a 100GiB total store, we have
> 3E6 keys between store and cache (for CHKs, same for SSKs).  That
> implies 17MiB of bloom filters.  If we size the filters at 1MiB each,
> with 17 filters, we have 176k keys per filter on average.  From the
> preceding formula, standard deviation is 408 keys.
> 
> Size variation is only a serious concern if the hash function is not
> distributing the keys at random.  To be safe, we could slightly
> underfill the filters.
> 
> Evan Daniel


Ok, so chunk size is not a problem, as long as we allow a few percent extra.

Discussions on devl show that sending a sorted list of short hashes is around 
41 bits per key on a 500GB store, versus 23 for bloom filters, so is out.

However, a lossy hashtable of short hashes may be an option. evanbd is working 
on simulations of hashtables, the main issues:
- Need a separate direct-mapped hashtable for the store, but we *never need to 
rebuild it* as we can correct it on the fly.
- The separate hashtable for sharing will need to be rebuilt after an unclean 
shutdown, but a journal avoids this also. So no rebuilds ever. Which IMHO is a 
significant performance/usability improvement.
- False positives are in the same region as with bloom filters, maybe 18 bits 
with 4 probes.
- False negatives exist with hashtables. This is bad, they don't exist with 
bloom filters. But they can be limited, evanbd is simulating this. What 
proportion of false negatives is acceptable is an open question, simulations 
should show what various probe counts and table size/total keys ratios give.
- Update traffic probably significantly more compact than with bloom filters, 
just as with sorted list of short hashes.

IRC log:

[14:42:56] <evanbd> toad_: If you're worried about disk space, solving the 
problem of stores never filling would be a good start :)
[14:42:59] <toad_> occam's razor ... don't multiply your entities more than you 
absolutely need to
[14:43:21] <evanbd> True...  complex UI is bad, and Freenet already doesn't fit 
metaphors users expect
[14:43:25] <toad_> evanbd: with a combined store/cache? maybe ...
[14:44:06] <toad_> it's not easy with sdiz 's store but it is probably possible
[14:45:21] <toad_> we should seriously consider it though, it could gain us a 
lot of short-term storage space
[14:45:37] <evanbd> Speaking of store / cache implementation issues...  have 
you or sdiz considered cuckoo hashing for the store?  It's quite possible you 
could get better utilization and improved read performance compared to the 
current quadratic probing.
[14:45:49] <toad_> i haven't, talk to sdiz
[14:46:27] <evanbd> toad_: Yes, currently I think most of the space in the 
stores is not available to the network in any meaningful sense, if people 
actually use large stores (which I suspect is common, but don't know)
[14:46:39] <toad_> evanbd: "most" ??
[14:46:52] <toad_> you mean most of the store's half? yeah ...
[14:47:26] <toad_> it fills up very slowly, and given the bandwidth:storage 
ratios, most nodes with larger stores probably don't have a full store
[14:47:30] <evanbd> Yeah.  I recently expanded it, but previously I had 100 GiB 
(store + cache), of which the cache was 95%+ full, and the store ~12%.
[14:47:55] <toad_> yeah ... we need to solve that, maybe before 0.8.0 ...
[14:47:58] <evanbd> I suspect that's more space than average, but 20-50GiB 
average wouldn't surprise me at all.
[14:48:28] <toad_> of course it's another complicating factor for bloom filter 
sharing ...
[14:48:56] <evanbd> Complicating to security, or implementation details?
[14:49:09] <toad_> implementation
[14:49:18] <toad_> splitting into chunks of reasonable size specifically
[14:49:32] <-- NullAcht15 has left this server (Read error: 110 (Connection 
timed out)).
[14:49:57] <evanbd> Why is that harder than with separate store/cache?
[14:50:22] <toad_> more variation in the number of keys
[14:51:56] <toad_> evanbd: any opinions on the proposal i sent to not transfer 
blooms at all but short key hashes?
[14:52:56] <toad_> Size variation is only a serious concern if the hash 
function is not
[14:52:56] <toad_> distributing the keys at random.  To be safe, we could 
slightly
[14:52:56] <toad_> underfill the filters.
[14:52:58] <toad_> hmmm ok
[14:53:29] <evanbd> toad_:  I think it saves bandwidth at memory for the same 
false positive rate, for the same reasons I computed when suggesting doing it 
in memory.  It also means not having to rebuild the filter periodically.  These 
are good things.
[14:53:44] <toad_> if we have a combined cache/store then we'd just keep a 
single set of bloom filters for the combination ...
[14:53:47] <evanbd> There *might* be security implications, but I'm having 
trouble seeing them.
[14:54:22] <evanbd> Right, and if the hash function isn't distributing things 
well, you should write a paper on the SHA weakness you've found :)
[14:54:43] <toad_> yeah ... but that assumes we can afford all those SHA1's
[14:54:53] <toad_> in practice, the filter on the datastore uses a weak PRNG
[14:55:00] <toad_> because SHA1 is expensive
[14:55:18] <toad_> although imho we can afford one SHA1 ... sdiz is just using 
the PRNG after that iirc
[14:55:20] <evanbd> Surely not *that* expensive.  We're doing write rates 
measured in fractional keys/s
[14:55:32] <toad_> anyway the point is for bloom filter sharing it's one SHA256 
per key per peer
[14:55:38] <evanbd> SHA1 or SHA256?
[14:55:41] <toad_> so 20 or 50 SHA256's
[14:55:55] <toad_> assuming all peers have different nonces, which imho is a 
sensible security precaution
[14:56:27] <toad_> evanbd: well ... if we are using only a small part of the 
output, can we get away with SHA1? SHA1 is known to be somewhat weak ...
[14:56:36] <sdiz> toad_: we are using sha
[14:56:46] <evanbd> How long does it take?  If we had a write rate of 5 key/s 
and 50 peers (fast node, improved network performance), thats 250 SHA/s.  
Surely they're not *that* expensive.
[14:56:49] <toad_> sdiz: you SHA it once and then use a PRNG, correct?
[14:57:03] <toad_> evanbd: it's not just writes, it's any request for a key
[14:57:41] <sdiz> oh, ya
[14:57:47] <toad_> evanbd: yeah it may not be a problem ... it's something to 
check ... 1sec, a bit of bsh ...
[14:58:13] <sdiz> that's MT.   ...iirc, the first few entries of MT is not that 
random,
[14:59:03] <evanbd> Hmm, does it have to be on reads rather than writes?  We 
salt the hashes before transmitting the filter, and send the salt with the 
filter.  What if the recipient chose the salt, so all his filters used the same 
salt?
[14:59:41] <sdiz> we salt them when we write it in store ....  so the recipient 
can't choose.
[15:00:22] <evanbd> sdiz: but weren't we planning to use different salts for 
store vs publicized filters?  I suppose if we share hashes instead of bloom 
filters that wouldn't work.
[15:00:52] <toad_> bsh is a wonderful thing ...
[15:01:08] <toad_> 1M SHA256's in 10 seconds on one core of a phenom 2 2.2GHz x 
4
[15:01:19] <toad_> so SHA256's are *fast*
[15:01:39] <toad_> sdiz: i guess you switched it because of store maintenance 
cpu usage?
[15:01:47] <evanbd> So 1k hashes/s represents 1% CPU load.  If it's at all 
useful, that sounds worth it :)
[15:01:56] <toad_> evanbd: ok
[15:02:07] <toad_> evanbd: no i mean when a request comes in we have to check 
all the shared bloom filters
[15:02:32] <toad_> evanbd: so we do 50x SHA256's, and then 50x binary searches 
in the filters for each
[15:02:38] <evanbd> toad_: right, but we only have to do a separate hash for 
each filter if the salts are different.
[15:02:43] <sdiz> hmm... that need rebuilding the bloom filter ... i would wait 
for the full design before this change.
[15:03:00] <toad_> sdiz: I'm not proposing to change that
[15:03:26] <toad_> sdiz: we need a separate filter system for bloom filter 
sharing, at least in the short term, because it only includes keys added since 
we change the caching criteria
[15:03:43] <evanbd> I suppose publicizing the store salt doesn't really 
represent a big security concern.
[15:03:50] <toad_> evanbd: yes and there are security reasons to have different 
salts
[15:04:10] <toad_> evanbd: specifically to prevent a hash-based exploit from 
affecting more than one node
[15:04:15] <sdiz> i have not measured sha256...  i use mt becouse i think we 
don't need that kind of security on bloom filter
[15:04:16] <evanbd> Different salts between which?
[15:04:42] <evanbd> Different salts for the one we publish vs what we use on 
the store, you mean?
[15:04:46] <toad_> evanbd: each node has its own salt for the bloom filters (or 
similar structures) which are shared with other nodes; it is public, but it is 
unique
[15:05:54] <evanbd> toad_: Right.  And that salt is not the same salt as is 
used locally for the key -> location in store mapping, right?
[15:06:06] <toad_> yeah
[15:06:10] <toad_> that should be separate
[15:06:24] <toad_> ok
[15:06:36] <toad_> so we can use either bloom filters or the short-hashes 
proposal?
[15:06:37] <sdiz> hmm... we don't store the original key in the current data 
structure.
[15:06:44] <evanbd> So...  right now the plan is that my node chooses a 
(secret) local salt, and a published unique salt for the shared filters.
[15:07:03] <toad_> and the chunk size issue is a non-issue if we have 1MB 
filters, for any plausible store size?
[15:07:21] <evanbd> Why not instead have the node that's *receiving* the bloom 
filter choose the salt?
[15:07:36] <toad_> evanbd: because we'd have to regenerate it?
[15:08:13] <toad_> and we'd have to keep the full keys in ram (or on disk) for 
every hour ...
[15:08:20] <evanbd> Regenerate the shared salt?  I thought it hadn't been 
implemented yet!
[15:08:47] <toad_> evanbd: if the receiving node chooses the salt, then the 
sending node will need to generate a different bloom filter for each node
[15:09:13] <toad_> evanbd: also there are security issues if an attacker can 
try lots of different salts?
[15:10:02] <evanbd> Ah, yes, that would mean keeping independent copies of the 
bloom filters we send out :/
[15:10:09] <evanbd> OK, never mind then.
[15:11:02] <evanbd> But, if we do send hashes rather than bloom filters, this 
all becomes irrelevant?  Since we'd be using the same salt for the store and 
the publicized hashes?
[15:11:12] <toad_> no, we wouldn't
[15:11:18] <toad_> imho that is a security risk
[15:11:53] <toad_> if they know your store salt they can delete keys
[15:11:59] <evanbd> That costs in collision rate, then :/
[15:12:03] <toad_> hmmm?
[15:12:22] <toad_> it costs in bits
[15:12:35] <toad_> and in lookups
[15:12:49] <toad_> we'd be dealing with big sorted blocks of hashes, rather 
than with direct-mapped stuff
[15:13:02] <toad_> but we can't use direct-mapped stuff anyway because of the 
issues with old (pre-caching-change) keys
[15:13:04] <evanbd> True, if you're doing distinct salts, you can just make the 
transmitted hashtable larger than the store.
[15:13:25] <evanbd> Wait, you're sending a list of hashes, not a hashtable?
[15:13:28] <toad_> yes
[15:13:28] --> bertodsera has joined this channel (n=quasseul at 95.133.127.64).
[15:13:38] <toad_> binary search in ram is cheap, ram seeks are measured in 
nanoseconds
[15:13:50] <toad_> and it makes it really really flexible
[15:13:55] <toad_> we can receive 1 key if we want to
[15:14:25] <toad_> imho the security issues are identical for the two approaches
[15:15:11] <evanbd> Hmm, I suppose it does at that.  Somewhat less 
space-efficient, though.
[15:15:40] <toad_> evanbd: well what's the alternative? a lossy hashtable? that 
is not space efficient either, unless it is direct-mapped
[15:15:49] <toad_> and imho direct-mapped is a significant security issue
[15:16:03] <evanbd> direct-mapped = use the same salt as privately?
[15:16:20] <toad_> yes, one cell in the hashtable equals the key stored at that 
location in the datastore
[15:17:51] <toad_> as in your original proposal for optimising the in-ram 
structures for the store
[15:18:01] <evanbd> If you want a fp rate of 15 ppm, that means you need 36 bit 
hashes if you have 1M keys.  The 23 bits/key bloom filter wins on space unless 
you compress in ram.
[15:18:14] <toad_> hmmm?
[15:18:26] <evanbd> For the sorted list of hashes, that is
[15:18:54] <toad_> 23 bits false positives = 16 bits hash, then you add upper 
bound log2 of the number of keys
[15:18:56] <toad_> correct?
[15:19:24] <evanbd> Yes.
[15:20:14] <toad_> ok so 500GB store is ((500*1024^3)/(32768+1024))*2 keys = 
31,775,030
[15:20:41] <evanbd> So once log2(# keys) > 7, the bloom filter wins.  Of 
course, you can compress a sorted list of hashes by storing as a carefully 
structured binary tree, but that gets complicated.
[15:20:42] <toad_> so 25 + 16 = 41 bits ...
[15:21:02] <toad_> hmmm
[15:21:12] <toad_> whereas bloom filter is _always_ 23 bits per key?
[15:21:17] <evanbd> Right.
[15:21:43] <toad_> ok so bloom filter sharing is always better on initial 
transmission of a populated filter, and on storage
[15:22:21] <evanbd> Alternate proposal: transmitted filter is a hashtable, 
similar to the direct-mapped case.  Except you use a different salt, use more 
buckets than the on-disk structure, and do slightly deeper quadratic probing.
[15:22:44] <toad_> right ... but doesn't that give poor false positives?
[15:22:57] <evanbd> The lower load factor + deeper probing should keep 
lossiness to an acceptable level.
[15:23:05] <toad_> do you have a formula?
[15:24:01] <toad_> hmmm
[15:24:08] <toad_> in fact, that gives false negatives, doesn't it
[15:24:08] <toad_> ?
[15:24:15] <evanbd> The direct mapped hash table has a fp rate = 2^(-1 * 
(transmitted hash size in bits) + log2(number of probes))...  so 18 bits and 4 
probes = 2^-16 ~= fp rate for 23 bit/key bloom filter
[15:24:18] <toad_> as well as false positives
[15:24:46] <evanbd> The fp rate formula for the non-direct-mapped case is the 
same, except it's improved slightly by the lower load factor.
[15:25:06] <evanbd> Since some of the probes will return null rather than a 
possible collision.
[15:25:31] <evanbd> But keys that don't collide in the store, and then do 
collide in the filter, will result in false negatives.
[15:26:04] <toad_> right, and to compensate for that, we increase the size, and 
we end up with something bigger than a bloom filter?
[15:26:15] <evanbd> I'm not certain.
[15:26:57] <toad_> hmmm
[15:27:10] <toad_> how come the above formula doesn't incorporate the size of 
the table?
[15:27:32] <evanbd> If we start at the 18 bits/key number, increase the probe 
count to 8 (going to 19 bits to compensate), and increase the size by 5%, we're 
at 20 bits/key
[15:29:26] <toad_> how come the above formula doesn't incorporate the size of 
the table?
[15:29:29] <evanbd> For a 1M entry table, you produce two hashes.  One is a 20 
bit hash, used to select the table index.  The second is a 16 bit hash that 
gets stored at that location.  The table index portion doesn't need to be 
transmitted; the effective total size (20 + 16 = 36 bits) is hidden and grows 
as the number of entries does, so the term cancels out.
[15:29:57] <toad_> ah, so the "transmitted hash size in bits" is the 20 bit 
hash used to select the table index?
[15:30:05] <toad_> not the stored 16 bit hash?
[15:30:27] <toad_> or is it the sum of both?
[15:31:18] <toad_> it's the 16 bit hash
[15:31:18] <toad_> ok
[15:31:25] <evanbd> The collision rate is equivalent to transmitting a sorted 
list of 20+16=36 bit hashes
[15:32:04] <toad_> i still don't see why if i halve the size of the table i get 
the same false positives rate
[15:32:15] <evanbd> *provided* you guarantee that the 20-bit index hash portion 
doesn't collide, which the direct-mapped case does
[15:32:42] <toad_> ah
[15:32:50] <toad_> so if it's not direct-mapped, it's different
[15:32:55] <evanbd> Yeah.
[15:33:15] <toad_> so it looks to me like it's fairly clear cut that we need to 
go with bloom filters
[15:33:30] <toad_> but otoh the chunk size issue is a non-issue, provided we 
allow a few percent slack
[15:33:35] <evanbd> Wait...  no, actually, I'm not sure it is different.
[15:35:11] <evanbd> The fp rate is (fp rate per comparison) * (number of 
comparisons)
[15:35:26] <evanbd> In the sorted list case, you're comparing vs *every hash in 
the list*
[15:35:53] <evanbd> In the hashtable case, direct mapped or otherwise, number 
of comparisons = probe depth
[15:35:56] <-- NullAcht15 has left this server (Read error: 110 (Connection 
timed out)).
[15:36:35] <toad_> ok so 2^-36 * 4 with 4 probes, 1 million entries, each 16 
bit?
[15:37:20] <toad_> that seems unlikely ...
[15:37:33] <evanbd> If you have a 1M entry sorted list, with 36 bit hashes, 
then the collision rate is 2^-36 * 2^20 = 2^-16
[15:38:05] <evanbd> If you have a 1M entry hash table, with 4 probes and 18-bit 
entries, then the collision rate is 2^-18 * 4 = 2^-16
[15:38:32] <toad_> why the * 2^20 ?
[15:38:38] <toad_> oh i see
[15:38:39] <toad_> ok
[15:38:43] <toad_> hmmm
[15:39:05] <toad_> i still don't understand the hashtable version though, it 
apparently doesn't affect false positives if the table size is 1 ?
[15:39:29] <toad_> imho if the table size is 1 then the collision rate is 
approximately 100% :)
[15:39:46] <toad_> you are assuming table size = number of keys ?
[15:39:58] <evanbd> Note that from an information-theoretic approach, the 
sorted list is compressible, so the two should be (nearly?) equivalent.  From a 
practical approach, getting information-theory limited compression of weird 
data like that sounds hard :)
[15:40:17] <toad_> what if the table size is not equal to the number of keys?
[15:40:37] <evanbd> No, if the table size is 1, then only keys with the same 
hash as the one key you've stored collide!
[15:40:49] <toad_> ok
[15:41:00] <evanbd> That presents as a reduced load factor; it provides a 
linear reduction in fp rate.
[15:41:32] <evanbd> If the table is only 95% full, then 5% of probes will 
return a null result rather than a potential collision.
[15:41:41] <toad_> collision rate != false positives rate
[15:42:01] <toad_> ?
[15:42:04] <evanbd> s/potential collision/potential fp/
[15:42:20] <toad_> ok...
[15:42:44] <toad_> so the false NEGATIVE rate is the other parameter, and that 
is controlled by the table size and the number of probes
[15:42:51] <evanbd> So if I have a 95% full table with 18 bit hashes, and do 4 
probes, I get a fp rate of 0.95 * 2^-18 * 4
[15:42:53] <toad_> well it's the other output
[15:43:24] <toad_> we would need to fix an acceptable false negative rate
[15:43:36] <evanbd> Yes, clearly.
[15:43:48] <toad_> both parameters are linear ...
[15:43:55] <toad_> right?
[15:44:01] <evanbd> I propose 1%, for no reason other than I like it :)
[15:44:13] <toad_> ok, how would we achieve 1% false negatives?
[15:44:31] <evanbd> No, the quadratic probing makes it better than linear, I 
believe.
[15:45:31] <evanbd> And cuckoo hashing or similar would likely do even better.
[15:46:29] <toad_> well, bloom filters have zero false negatives ...
[15:46:51] <evanbd> The false negative rate is (number of collisions that occur 
while inserting all keys into the table) / (total number of keys).
[15:47:59] <toad_> which is?
[15:48:30] <toad_> one issue here is that we would have to maintain a separate 
structure, over the long term, for the datastore
[15:48:38] <evanbd> I don't have a formula that accounts for quadratic probing. 
 There may be one, but it's trivial to simulate if not.
[15:48:42] <toad_> with bloom filters we can eventually get rid of the 
datastore filter
[15:49:22] <evanbd> Also, while bloom filters have zero false negatives, the 
current system has 100% :)
[15:49:24] --> DuClare has joined this channel (n=user at 
a81-197-104-54.elisa-laajakaista.fi).
[15:49:43] <toad_> old nodes need a separate filter for the store; new nodes 
can just use the bloom filters
[15:49:56] <toad_> whereas a hashtable based system would need both even on new 
nodes
[15:50:34] <evanbd> True.  But the memory concerns are dominated by our peer's 
filters, right?
[15:50:41] <toad_> yeah
[15:51:12] <toad_> well, if it can be done acceptably well with a hashtable 
then it might make senser
[15:51:16] <evanbd> I think we can get acceptable rates with ~ 20 bits/key from 
hash tables.
[15:51:19] <toad_> i'm leaning towards bloom filters at the moment
[15:51:37] <evanbd> IMHO the big gain for hash tables is that they don't have 
to be rebuilt; all updates are incremental.
[15:51:52] <evanbd> Bloom filters do, and even the counting filter will 
saturate eventually.
[15:52:07] <toad_> why do they not have to be rebuilt?
[15:52:20] <toad_> if they are memory mapped then losses are inevitable sooner 
or later
[15:52:33] <toad_> because something didn't hit disk
[15:53:17] <evanbd> OK, they don't have to be rebuilt during normal operation.
[15:53:38] <toad_> hmmm?
[15:54:04] <evanbd> Ignore crashes.  Bloom filters saturate!
[15:54:13] <toad_> ok
[15:54:22] <toad_> but we will still have bloom filters for the store!
[15:54:27] <toad_> we just won't have them for sharing
[15:54:36] <toad_> so is it such a big gain?
[15:54:43] <evanbd> Why not use the hash table for the store?
[15:54:46] <toad_> fair point
[15:54:49] <toad_> okay ... hmmm
[15:55:14] <evanbd> It's (slightly) more memory efficient, the big reason to 
not is to avoid writing and debugging the code.
[15:55:16] <toad_> we never need to rebuild the hash table for the store, 
because we can correct it on the fly, when writing to it
[15:55:23] <evanbd> Right.
[15:55:39] <toad_> however, the shareable table we may need to rebuild
[15:55:49] <evanbd> Crashes require either a rebuild or a journal or something.
[15:56:06] <evanbd> Only in the case of a crash, right?
[15:56:20] <toad_> well ...
[15:56:24] <toad_> possibly
[15:56:38] <toad_> a crash, or we shut down cleanly but then lose power, or 
shut down cleanly and then oops etc
[15:57:00] <evanbd> If we shut down cleanly, then the data is guaranteed to be 
on disk...
[15:57:10] <toad_> no, not unless we sync during the shutdown
[15:57:20] <toad_> which i guess we should do, but it will make it take longer
[15:57:23] <evanbd> And the reason not to sync on shutdown is?
[15:57:57] <evanbd> It takes what, a second in the worst case?  Node shutdown 
is not an end-user relevant performance metric :)
[15:58:04] <toad_> ok so on shutdown we sync both the store files and the store 
index files
[15:59:00] <toad_> on an unclean shutdown we will need to rebuild the shareable 
filter ... but it will only affect a few keys ...
[15:59:10] <toad_> so a journal might make sense
[16:00:01] <toad_> new blocks go into a data structure and are written to the 
journal, every so often we sync the journal, write the blocks, sync the main 
file and delete the journal ...
[16:00:24] <evanbd> Yep.
[16:00:41] <toad_> ok, i'm starting to like this scheme
[16:00:51] <toad_> we need numbers
[16:01:06] <evanbd> And on restart you only have to replay the journal.
[16:01:29] <toad_> good false positives rate can be achieved simply by 4 probes 
and 18 bits; but good false negatives rate is not so clear
[16:01:29] <evanbd> OK, I can simulate the hashtable fp / fn rates if you like.
[16:01:37] <toad_> ok
[16:01:43] <evanbd> What's the target number for negatives?
[16:01:56] <toad_> we need enough data to be sure that it doesn't vary 
drastically with different store sizes
[16:02:11] <toad_> evanbd: imho we should simulate 1%, 0.1% and 0.01%
[16:02:29] <toad_> imho 10% is too high
[16:02:47] <evanbd> OK.  on store sizes from, say, 300k keys to 10M keys?
[16:02:56] <toad_> well, 500GB is 30M keys
[16:03:06] <toad_> 100MB is the minimum and is rather less than that
[16:03:24] <toad_> 6k keys
[16:03:38] <toad_> so we should definitely simulate store sizes in that range 
as both are reasonable atm imho
[16:04:54] <evanbd> OK.  Once the size gets large it shouldn't matter, but I'll 
confirm that.
[16:05:58] * toad_ mailing devl
[16:12:48] <toad_> evanbd: if we use bloom filters ... if we are using more 
memory anyway for the peers' bloom filters, we can maybe use a larger counting 
width for our own filters ... I agree that a hashtable might be better if it 
can achieve a reasonable false negatives rate though
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20090629/6e42b842/attachment.pgp>

[freenet-dev] Can we implement Bloom filter sharing quickly???

Reply via email to