On Friday 19 June 2009 18:50:04 Evan Daniel wrote: > On Fri, Jun 19, 2009 at 1:36 PM, Matthew > Toseland<toad at amphibian.dyndns.org> wrote: > > On Friday 19 June 2009 17:37:00 Robert Hailey wrote: > >> > >> On Jun 18, 2009, at 9:07 PM, Matthew Toseland wrote: > >> > >> > CHUNK SIZE PROBLEM: > >> > =================== > >> > > >> > Current plans call to split the keyspace up for each datastore, and > >> > assign keys to a manageable sized bloom filter for a section of the > >> > (hashed) keyspace. These can then be transferred separately, are > >> > useful immediately after being transferred, can be juggled in > >> > memory, etc. However, we cannot guarantee that the populaion of such > >> > a chunk will be small enough for the filter to be effective. Solving > >> > this requires either moving the boundaries dynamically (possibly > >> > continually), or making the chunks significantly larger than they > >> > would need to be in an ideal world. > >> > >> Right... but I believe we prescribed that the split would be based on > >> a hashed value of the key and not by logical keyspace location to > >> avoid disproportionate chunks. > >> > >> That is to say... ideally a node is going to get a disporportionate > >> amount of cache/store data about it's network location. > >> > >> STORE_SIZE/MAX_BLOOM_CHUNK -> N_CHUNKS > >> H(key, N_CHUNKS) => n & (0 < n < N) > >> CHUNK[n].add(key) > >> > >> Wouldn't the problem be reduced to finding a well-scattering hash > >> function then? > > > > Yes and no. How much variation will we have even if we divide by hashed > > keyspace? Hence how much bigger than the ideal splitting size do we need > > each chunk to be to maintain approximately the right false positives ratio? > > If the hash spreads things well, number of keys in a single bloom > filter should be normally distributed with mean ((total keys) / > (number of filters)) and standard deviation sqrt((total keys) * (1 / > (number of filters)) * (1 - 1 / (number of filters))). (It's a > binomial distribution; any given key has a 1/ number of filters chance > of landing in a specific filter.) For a 100GiB total store, we have > 3E6 keys between store and cache (for CHKs, same for SSKs). That > implies 17MiB of bloom filters. If we size the filters at 1MiB each, > with 17 filters, we have 176k keys per filter on average. From the > preceding formula, standard deviation is 408 keys. > > Size variation is only a serious concern if the hash function is not > distributing the keys at random. To be safe, we could slightly > underfill the filters. > > Evan Daniel
Ok, so chunk size is not a problem, as long as we allow a few percent extra. Discussions on devl show that sending a sorted list of short hashes is around 41 bits per key on a 500GB store, versus 23 for bloom filters, so is out. However, a lossy hashtable of short hashes may be an option. evanbd is working on simulations of hashtables, the main issues: - Need a separate direct-mapped hashtable for the store, but we *never need to rebuild it* as we can correct it on the fly. - The separate hashtable for sharing will need to be rebuilt after an unclean shutdown, but a journal avoids this also. So no rebuilds ever. Which IMHO is a significant performance/usability improvement. - False positives are in the same region as with bloom filters, maybe 18 bits with 4 probes. - False negatives exist with hashtables. This is bad, they don't exist with bloom filters. But they can be limited, evanbd is simulating this. What proportion of false negatives is acceptable is an open question, simulations should show what various probe counts and table size/total keys ratios give. - Update traffic probably significantly more compact than with bloom filters, just as with sorted list of short hashes. IRC log: [14:42:56] <evanbd> toad_: If you're worried about disk space, solving the problem of stores never filling would be a good start :) [14:42:59] <toad_> occam's razor ... don't multiply your entities more than you absolutely need to [14:43:21] <evanbd> True... complex UI is bad, and Freenet already doesn't fit metaphors users expect [14:43:25] <toad_> evanbd: with a combined store/cache? maybe ... [14:44:06] <toad_> it's not easy with sdiz 's store but it is probably possible [14:45:21] <toad_> we should seriously consider it though, it could gain us a lot of short-term storage space [14:45:37] <evanbd> Speaking of store / cache implementation issues... have you or sdiz considered cuckoo hashing for the store? It's quite possible you could get better utilization and improved read performance compared to the current quadratic probing. [14:45:49] <toad_> i haven't, talk to sdiz [14:46:27] <evanbd> toad_: Yes, currently I think most of the space in the stores is not available to the network in any meaningful sense, if people actually use large stores (which I suspect is common, but don't know) [14:46:39] <toad_> evanbd: "most" ?? [14:46:52] <toad_> you mean most of the store's half? yeah ... [14:47:26] <toad_> it fills up very slowly, and given the bandwidth:storage ratios, most nodes with larger stores probably don't have a full store [14:47:30] <evanbd> Yeah. I recently expanded it, but previously I had 100 GiB (store + cache), of which the cache was 95%+ full, and the store ~12%. [14:47:55] <toad_> yeah ... we need to solve that, maybe before 0.8.0 ... [14:47:58] <evanbd> I suspect that's more space than average, but 20-50GiB average wouldn't surprise me at all. [14:48:28] <toad_> of course it's another complicating factor for bloom filter sharing ... [14:48:56] <evanbd> Complicating to security, or implementation details? [14:49:09] <toad_> implementation [14:49:18] <toad_> splitting into chunks of reasonable size specifically [14:49:32] <-- NullAcht15 has left this server (Read error: 110 (Connection timed out)). [14:49:57] <evanbd> Why is that harder than with separate store/cache? [14:50:22] <toad_> more variation in the number of keys [14:51:56] <toad_> evanbd: any opinions on the proposal i sent to not transfer blooms at all but short key hashes? [14:52:56] <toad_> Size variation is only a serious concern if the hash function is not [14:52:56] <toad_> distributing the keys at random. To be safe, we could slightly [14:52:56] <toad_> underfill the filters. [14:52:58] <toad_> hmmm ok [14:53:29] <evanbd> toad_: I think it saves bandwidth at memory for the same false positive rate, for the same reasons I computed when suggesting doing it in memory. It also means not having to rebuild the filter periodically. These are good things. [14:53:44] <toad_> if we have a combined cache/store then we'd just keep a single set of bloom filters for the combination ... [14:53:47] <evanbd> There *might* be security implications, but I'm having trouble seeing them. [14:54:22] <evanbd> Right, and if the hash function isn't distributing things well, you should write a paper on the SHA weakness you've found :) [14:54:43] <toad_> yeah ... but that assumes we can afford all those SHA1's [14:54:53] <toad_> in practice, the filter on the datastore uses a weak PRNG [14:55:00] <toad_> because SHA1 is expensive [14:55:18] <toad_> although imho we can afford one SHA1 ... sdiz is just using the PRNG after that iirc [14:55:20] <evanbd> Surely not *that* expensive. We're doing write rates measured in fractional keys/s [14:55:32] <toad_> anyway the point is for bloom filter sharing it's one SHA256 per key per peer [14:55:38] <evanbd> SHA1 or SHA256? [14:55:41] <toad_> so 20 or 50 SHA256's [14:55:55] <toad_> assuming all peers have different nonces, which imho is a sensible security precaution [14:56:27] <toad_> evanbd: well ... if we are using only a small part of the output, can we get away with SHA1? SHA1 is known to be somewhat weak ... [14:56:36] <sdiz> toad_: we are using sha [14:56:46] <evanbd> How long does it take? If we had a write rate of 5 key/s and 50 peers (fast node, improved network performance), thats 250 SHA/s. Surely they're not *that* expensive. [14:56:49] <toad_> sdiz: you SHA it once and then use a PRNG, correct? [14:57:03] <toad_> evanbd: it's not just writes, it's any request for a key [14:57:41] <sdiz> oh, ya [14:57:47] <toad_> evanbd: yeah it may not be a problem ... it's something to check ... 1sec, a bit of bsh ... [14:58:13] <sdiz> that's MT. ...iirc, the first few entries of MT is not that random, [14:59:03] <evanbd> Hmm, does it have to be on reads rather than writes? We salt the hashes before transmitting the filter, and send the salt with the filter. What if the recipient chose the salt, so all his filters used the same salt? [14:59:41] <sdiz> we salt them when we write it in store .... so the recipient can't choose. [15:00:22] <evanbd> sdiz: but weren't we planning to use different salts for store vs publicized filters? I suppose if we share hashes instead of bloom filters that wouldn't work. [15:00:52] <toad_> bsh is a wonderful thing ... [15:01:08] <toad_> 1M SHA256's in 10 seconds on one core of a phenom 2 2.2GHz x 4 [15:01:19] <toad_> so SHA256's are *fast* [15:01:39] <toad_> sdiz: i guess you switched it because of store maintenance cpu usage? [15:01:47] <evanbd> So 1k hashes/s represents 1% CPU load. If it's at all useful, that sounds worth it :) [15:01:56] <toad_> evanbd: ok [15:02:07] <toad_> evanbd: no i mean when a request comes in we have to check all the shared bloom filters [15:02:32] <toad_> evanbd: so we do 50x SHA256's, and then 50x binary searches in the filters for each [15:02:38] <evanbd> toad_: right, but we only have to do a separate hash for each filter if the salts are different. [15:02:43] <sdiz> hmm... that need rebuilding the bloom filter ... i would wait for the full design before this change. [15:03:00] <toad_> sdiz: I'm not proposing to change that [15:03:26] <toad_> sdiz: we need a separate filter system for bloom filter sharing, at least in the short term, because it only includes keys added since we change the caching criteria [15:03:43] <evanbd> I suppose publicizing the store salt doesn't really represent a big security concern. [15:03:50] <toad_> evanbd: yes and there are security reasons to have different salts [15:04:10] <toad_> evanbd: specifically to prevent a hash-based exploit from affecting more than one node [15:04:15] <sdiz> i have not measured sha256... i use mt becouse i think we don't need that kind of security on bloom filter [15:04:16] <evanbd> Different salts between which? [15:04:42] <evanbd> Different salts for the one we publish vs what we use on the store, you mean? [15:04:46] <toad_> evanbd: each node has its own salt for the bloom filters (or similar structures) which are shared with other nodes; it is public, but it is unique [15:05:54] <evanbd> toad_: Right. And that salt is not the same salt as is used locally for the key -> location in store mapping, right? [15:06:06] <toad_> yeah [15:06:10] <toad_> that should be separate [15:06:24] <toad_> ok [15:06:36] <toad_> so we can use either bloom filters or the short-hashes proposal? [15:06:37] <sdiz> hmm... we don't store the original key in the current data structure. [15:06:44] <evanbd> So... right now the plan is that my node chooses a (secret) local salt, and a published unique salt for the shared filters. [15:07:03] <toad_> and the chunk size issue is a non-issue if we have 1MB filters, for any plausible store size? [15:07:21] <evanbd> Why not instead have the node that's *receiving* the bloom filter choose the salt? [15:07:36] <toad_> evanbd: because we'd have to regenerate it? [15:08:13] <toad_> and we'd have to keep the full keys in ram (or on disk) for every hour ... [15:08:20] <evanbd> Regenerate the shared salt? I thought it hadn't been implemented yet! [15:08:47] <toad_> evanbd: if the receiving node chooses the salt, then the sending node will need to generate a different bloom filter for each node [15:09:13] <toad_> evanbd: also there are security issues if an attacker can try lots of different salts? [15:10:02] <evanbd> Ah, yes, that would mean keeping independent copies of the bloom filters we send out :/ [15:10:09] <evanbd> OK, never mind then. [15:11:02] <evanbd> But, if we do send hashes rather than bloom filters, this all becomes irrelevant? Since we'd be using the same salt for the store and the publicized hashes? [15:11:12] <toad_> no, we wouldn't [15:11:18] <toad_> imho that is a security risk [15:11:53] <toad_> if they know your store salt they can delete keys [15:11:59] <evanbd> That costs in collision rate, then :/ [15:12:03] <toad_> hmmm? [15:12:22] <toad_> it costs in bits [15:12:35] <toad_> and in lookups [15:12:49] <toad_> we'd be dealing with big sorted blocks of hashes, rather than with direct-mapped stuff [15:13:02] <toad_> but we can't use direct-mapped stuff anyway because of the issues with old (pre-caching-change) keys [15:13:04] <evanbd> True, if you're doing distinct salts, you can just make the transmitted hashtable larger than the store. [15:13:25] <evanbd> Wait, you're sending a list of hashes, not a hashtable? [15:13:28] <toad_> yes [15:13:28] --> bertodsera has joined this channel (n=quasseul at 95.133.127.64). [15:13:38] <toad_> binary search in ram is cheap, ram seeks are measured in nanoseconds [15:13:50] <toad_> and it makes it really really flexible [15:13:55] <toad_> we can receive 1 key if we want to [15:14:25] <toad_> imho the security issues are identical for the two approaches [15:15:11] <evanbd> Hmm, I suppose it does at that. Somewhat less space-efficient, though. [15:15:40] <toad_> evanbd: well what's the alternative? a lossy hashtable? that is not space efficient either, unless it is direct-mapped [15:15:49] <toad_> and imho direct-mapped is a significant security issue [15:16:03] <evanbd> direct-mapped = use the same salt as privately? [15:16:20] <toad_> yes, one cell in the hashtable equals the key stored at that location in the datastore [15:17:51] <toad_> as in your original proposal for optimising the in-ram structures for the store [15:18:01] <evanbd> If you want a fp rate of 15 ppm, that means you need 36 bit hashes if you have 1M keys. The 23 bits/key bloom filter wins on space unless you compress in ram. [15:18:14] <toad_> hmmm? [15:18:26] <evanbd> For the sorted list of hashes, that is [15:18:54] <toad_> 23 bits false positives = 16 bits hash, then you add upper bound log2 of the number of keys [15:18:56] <toad_> correct? [15:19:24] <evanbd> Yes. [15:20:14] <toad_> ok so 500GB store is ((500*1024^3)/(32768+1024))*2 keys = 31,775,030 [15:20:41] <evanbd> So once log2(# keys) > 7, the bloom filter wins. Of course, you can compress a sorted list of hashes by storing as a carefully structured binary tree, but that gets complicated. [15:20:42] <toad_> so 25 + 16 = 41 bits ... [15:21:02] <toad_> hmmm [15:21:12] <toad_> whereas bloom filter is _always_ 23 bits per key? [15:21:17] <evanbd> Right. [15:21:43] <toad_> ok so bloom filter sharing is always better on initial transmission of a populated filter, and on storage [15:22:21] <evanbd> Alternate proposal: transmitted filter is a hashtable, similar to the direct-mapped case. Except you use a different salt, use more buckets than the on-disk structure, and do slightly deeper quadratic probing. [15:22:44] <toad_> right ... but doesn't that give poor false positives? [15:22:57] <evanbd> The lower load factor + deeper probing should keep lossiness to an acceptable level. [15:23:05] <toad_> do you have a formula? [15:24:01] <toad_> hmmm [15:24:08] <toad_> in fact, that gives false negatives, doesn't it [15:24:08] <toad_> ? [15:24:15] <evanbd> The direct mapped hash table has a fp rate = 2^(-1 * (transmitted hash size in bits) + log2(number of probes))... so 18 bits and 4 probes = 2^-16 ~= fp rate for 23 bit/key bloom filter [15:24:18] <toad_> as well as false positives [15:24:46] <evanbd> The fp rate formula for the non-direct-mapped case is the same, except it's improved slightly by the lower load factor. [15:25:06] <evanbd> Since some of the probes will return null rather than a possible collision. [15:25:31] <evanbd> But keys that don't collide in the store, and then do collide in the filter, will result in false negatives. [15:26:04] <toad_> right, and to compensate for that, we increase the size, and we end up with something bigger than a bloom filter? [15:26:15] <evanbd> I'm not certain. [15:26:57] <toad_> hmmm [15:27:10] <toad_> how come the above formula doesn't incorporate the size of the table? [15:27:32] <evanbd> If we start at the 18 bits/key number, increase the probe count to 8 (going to 19 bits to compensate), and increase the size by 5%, we're at 20 bits/key [15:29:26] <toad_> how come the above formula doesn't incorporate the size of the table? [15:29:29] <evanbd> For a 1M entry table, you produce two hashes. One is a 20 bit hash, used to select the table index. The second is a 16 bit hash that gets stored at that location. The table index portion doesn't need to be transmitted; the effective total size (20 + 16 = 36 bits) is hidden and grows as the number of entries does, so the term cancels out. [15:29:57] <toad_> ah, so the "transmitted hash size in bits" is the 20 bit hash used to select the table index? [15:30:05] <toad_> not the stored 16 bit hash? [15:30:27] <toad_> or is it the sum of both? [15:31:18] <toad_> it's the 16 bit hash [15:31:18] <toad_> ok [15:31:25] <evanbd> The collision rate is equivalent to transmitting a sorted list of 20+16=36 bit hashes [15:32:04] <toad_> i still don't see why if i halve the size of the table i get the same false positives rate [15:32:15] <evanbd> *provided* you guarantee that the 20-bit index hash portion doesn't collide, which the direct-mapped case does [15:32:42] <toad_> ah [15:32:50] <toad_> so if it's not direct-mapped, it's different [15:32:55] <evanbd> Yeah. [15:33:15] <toad_> so it looks to me like it's fairly clear cut that we need to go with bloom filters [15:33:30] <toad_> but otoh the chunk size issue is a non-issue, provided we allow a few percent slack [15:33:35] <evanbd> Wait... no, actually, I'm not sure it is different. [15:35:11] <evanbd> The fp rate is (fp rate per comparison) * (number of comparisons) [15:35:26] <evanbd> In the sorted list case, you're comparing vs *every hash in the list* [15:35:53] <evanbd> In the hashtable case, direct mapped or otherwise, number of comparisons = probe depth [15:35:56] <-- NullAcht15 has left this server (Read error: 110 (Connection timed out)). [15:36:35] <toad_> ok so 2^-36 * 4 with 4 probes, 1 million entries, each 16 bit? [15:37:20] <toad_> that seems unlikely ... [15:37:33] <evanbd> If you have a 1M entry sorted list, with 36 bit hashes, then the collision rate is 2^-36 * 2^20 = 2^-16 [15:38:05] <evanbd> If you have a 1M entry hash table, with 4 probes and 18-bit entries, then the collision rate is 2^-18 * 4 = 2^-16 [15:38:32] <toad_> why the * 2^20 ? [15:38:38] <toad_> oh i see [15:38:39] <toad_> ok [15:38:43] <toad_> hmmm [15:39:05] <toad_> i still don't understand the hashtable version though, it apparently doesn't affect false positives if the table size is 1 ? [15:39:29] <toad_> imho if the table size is 1 then the collision rate is approximately 100% :) [15:39:46] <toad_> you are assuming table size = number of keys ? [15:39:58] <evanbd> Note that from an information-theoretic approach, the sorted list is compressible, so the two should be (nearly?) equivalent. From a practical approach, getting information-theory limited compression of weird data like that sounds hard :) [15:40:17] <toad_> what if the table size is not equal to the number of keys? [15:40:37] <evanbd> No, if the table size is 1, then only keys with the same hash as the one key you've stored collide! [15:40:49] <toad_> ok [15:41:00] <evanbd> That presents as a reduced load factor; it provides a linear reduction in fp rate. [15:41:32] <evanbd> If the table is only 95% full, then 5% of probes will return a null result rather than a potential collision. [15:41:41] <toad_> collision rate != false positives rate [15:42:01] <toad_> ? [15:42:04] <evanbd> s/potential collision/potential fp/ [15:42:20] <toad_> ok... [15:42:44] <toad_> so the false NEGATIVE rate is the other parameter, and that is controlled by the table size and the number of probes [15:42:51] <evanbd> So if I have a 95% full table with 18 bit hashes, and do 4 probes, I get a fp rate of 0.95 * 2^-18 * 4 [15:42:53] <toad_> well it's the other output [15:43:24] <toad_> we would need to fix an acceptable false negative rate [15:43:36] <evanbd> Yes, clearly. [15:43:48] <toad_> both parameters are linear ... [15:43:55] <toad_> right? [15:44:01] <evanbd> I propose 1%, for no reason other than I like it :) [15:44:13] <toad_> ok, how would we achieve 1% false negatives? [15:44:31] <evanbd> No, the quadratic probing makes it better than linear, I believe. [15:45:31] <evanbd> And cuckoo hashing or similar would likely do even better. [15:46:29] <toad_> well, bloom filters have zero false negatives ... [15:46:51] <evanbd> The false negative rate is (number of collisions that occur while inserting all keys into the table) / (total number of keys). [15:47:59] <toad_> which is? [15:48:30] <toad_> one issue here is that we would have to maintain a separate structure, over the long term, for the datastore [15:48:38] <evanbd> I don't have a formula that accounts for quadratic probing. There may be one, but it's trivial to simulate if not. [15:48:42] <toad_> with bloom filters we can eventually get rid of the datastore filter [15:49:22] <evanbd> Also, while bloom filters have zero false negatives, the current system has 100% :) [15:49:24] --> DuClare has joined this channel (n=user at a81-197-104-54.elisa-laajakaista.fi). [15:49:43] <toad_> old nodes need a separate filter for the store; new nodes can just use the bloom filters [15:49:56] <toad_> whereas a hashtable based system would need both even on new nodes [15:50:34] <evanbd> True. But the memory concerns are dominated by our peer's filters, right? [15:50:41] <toad_> yeah [15:51:12] <toad_> well, if it can be done acceptably well with a hashtable then it might make senser [15:51:16] <evanbd> I think we can get acceptable rates with ~ 20 bits/key from hash tables. [15:51:19] <toad_> i'm leaning towards bloom filters at the moment [15:51:37] <evanbd> IMHO the big gain for hash tables is that they don't have to be rebuilt; all updates are incremental. [15:51:52] <evanbd> Bloom filters do, and even the counting filter will saturate eventually. [15:52:07] <toad_> why do they not have to be rebuilt? [15:52:20] <toad_> if they are memory mapped then losses are inevitable sooner or later [15:52:33] <toad_> because something didn't hit disk [15:53:17] <evanbd> OK, they don't have to be rebuilt during normal operation. [15:53:38] <toad_> hmmm? [15:54:04] <evanbd> Ignore crashes. Bloom filters saturate! [15:54:13] <toad_> ok [15:54:22] <toad_> but we will still have bloom filters for the store! [15:54:27] <toad_> we just won't have them for sharing [15:54:36] <toad_> so is it such a big gain? [15:54:43] <evanbd> Why not use the hash table for the store? [15:54:46] <toad_> fair point [15:54:49] <toad_> okay ... hmmm [15:55:14] <evanbd> It's (slightly) more memory efficient, the big reason to not is to avoid writing and debugging the code. [15:55:16] <toad_> we never need to rebuild the hash table for the store, because we can correct it on the fly, when writing to it [15:55:23] <evanbd> Right. [15:55:39] <toad_> however, the shareable table we may need to rebuild [15:55:49] <evanbd> Crashes require either a rebuild or a journal or something. [15:56:06] <evanbd> Only in the case of a crash, right? [15:56:20] <toad_> well ... [15:56:24] <toad_> possibly [15:56:38] <toad_> a crash, or we shut down cleanly but then lose power, or shut down cleanly and then oops etc [15:57:00] <evanbd> If we shut down cleanly, then the data is guaranteed to be on disk... [15:57:10] <toad_> no, not unless we sync during the shutdown [15:57:20] <toad_> which i guess we should do, but it will make it take longer [15:57:23] <evanbd> And the reason not to sync on shutdown is? [15:57:57] <evanbd> It takes what, a second in the worst case? Node shutdown is not an end-user relevant performance metric :) [15:58:04] <toad_> ok so on shutdown we sync both the store files and the store index files [15:59:00] <toad_> on an unclean shutdown we will need to rebuild the shareable filter ... but it will only affect a few keys ... [15:59:10] <toad_> so a journal might make sense [16:00:01] <toad_> new blocks go into a data structure and are written to the journal, every so often we sync the journal, write the blocks, sync the main file and delete the journal ... [16:00:24] <evanbd> Yep. [16:00:41] <toad_> ok, i'm starting to like this scheme [16:00:51] <toad_> we need numbers [16:01:06] <evanbd> And on restart you only have to replay the journal. [16:01:29] <toad_> good false positives rate can be achieved simply by 4 probes and 18 bits; but good false negatives rate is not so clear [16:01:29] <evanbd> OK, I can simulate the hashtable fp / fn rates if you like. [16:01:37] <toad_> ok [16:01:43] <evanbd> What's the target number for negatives? [16:01:56] <toad_> we need enough data to be sure that it doesn't vary drastically with different store sizes [16:02:11] <toad_> evanbd: imho we should simulate 1%, 0.1% and 0.01% [16:02:29] <toad_> imho 10% is too high [16:02:47] <evanbd> OK. on store sizes from, say, 300k keys to 10M keys? [16:02:56] <toad_> well, 500GB is 30M keys [16:03:06] <toad_> 100MB is the minimum and is rather less than that [16:03:24] <toad_> 6k keys [16:03:38] <toad_> so we should definitely simulate store sizes in that range as both are reasonable atm imho [16:04:54] <evanbd> OK. Once the size gets large it shouldn't matter, but I'll confirm that. [16:05:58] * toad_ mailing devl [16:12:48] <toad_> evanbd: if we use bloom filters ... if we are using more memory anyway for the peers' bloom filters, we can maybe use a larger counting width for our own filters ... I agree that a hashtable might be better if it can achieve a reasonable false negatives rate though -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20090629/6e42b842/attachment.pgp>
