On 04/04/14 23:10, Robert Hailey wrote: > On 2014/04/04 (Apr), at 2:16 PM, Matthew Toseland wrote: > >> For several reasons I think we need to discuss the dubious security >> assumption of "it's hard to probe your peers' datastores", and the >> performance tradeoffs that go with it ... >> >> Why is this important now? >> - Darknet bloom filters. Okay, it's darknet; there's more trust; so >> that's okay then? Maybe. >> - Opennet bloom filters (for long-lived peers). We'll need to identify >> long-lived peers anyway for tunnels and security (MAST countermeasures). >> So it'd be good to do bloom filter sharing with them. >> - Opportunistic data exchange (possibly with CBR) (on both darknet and >> opennet). >> - Broadcast probes at low HTL. Same effect as bloom filter sharing but >> work with low uptime nodes. Can be made cheap; we don't have to wait for >> them before forwarding, and we because latency isn't a big deal we can >> aggregate them and make them cheap. (We might want to wait for them, >> with timeouts, before finishing with DNF) Would probably have to be >> non-forwardable, to cut costs ... so it's a true probe. But then the >> security model is the same for bloom filter sharing - except for the >> bandwidth cost, which makes probing bloom filters a bit more costly... > This is in line with the saying that it is easier to move an operation ("do > you have data X?") than to move data ("here's all the data I have!"). > > In my opinion, much of this almost-reached-the-data stuff (bloom filters, > data probes, FOAF, etc) serves to hide some deep bug or design flaw; that is, > to make a broken system usable. There are good reasons for the "brokenness". Datastores are not all the same size; the sink node for the key you are inserting might have a rather small store. Or it might be offline, although we try to deal with that to some degree. But it might still be cached nearby. > Passing INSERTs only to veteran nodes & having a "outer ring" of > connectedness (only applicable to opennet) might fix the lower level issues. Then what do you do with all the other nodes? Are you suggesting that we go with the original proposal where every node which isn't high bandwidth, high uptime, is transient? IMHO requesting from unreliable nodes is perfectly acceptable provided we are below the HTL threshold (so that MAST attackers still have to put in the resources to become "core" nodes). > IIRC, the motivation for bloom filter sharing was to hide the lookup from > your peer; the theory being... the fact that your node has a particular datum > is less interesting or volatile than the fact that someone is requesting it. No. The motivation for bloom filter sharing is *performance*. Simulations say it gains 10% success rates. IMHO given the inhomogeneity of the network it could be a lot more than that, but I don't know if the simulations took that into account. > It's a bit curious, but intriguing, that you mention aggregating the data > probes... seems kinda like hinting to your neighbors: "I'm working on X, Y, & > Z... any leads?"... esp. If we need some packet padding anyway... then, > well... in the common end case *most* of our neighbors will have already seen > the request (right?)... so I'm not sure how much this buys us. Why would they already have seen the request? The request goes to ~ 25 nodes. Each of them has up to 100 neighbours. Even if the topology is poor they can't all have been visited already. >> What is the model? >> - Censorship: We want it to be hard for bad guys to identify nodes >> containing a specific key and eliminate them. Classically, fetching a >> key tells you where it's stored (the data source, used for path >> folding), but in fetching it you have propagated it. The problem with >> this theory is you've only propagated it to *caches*, not stores. > Hmm... what if.... whenever an item drops from the cache, if it is a small > chk (i.e. one data packet, not a superblock) we turn it into a FAST_INSERT (a > one-shot insert, no thread/state required)... you just drain your cache back > into the network/store? The capacity of the network is finite. Every time we drop a key we insert the lost key -> the nodes that store it insert their lost keys -> load balloons real fast. Unless it just goes to nearby nodes at 1 hop who happen to have space, in which case it may or may not be findable - but is more likely to be found if we have bloom filter sharing... >> -- Does this mean we need to give more protection to the store than the >> cache? E.g. only do bloom filter sharing for stores, only read store for >> broadcast probes? > Wouldn't that be providing even more information to an attacker? if we let > them differentiate between what is stored versus cached? although... even > just the address of the node & the data might give a good indication of > that... >> -- Possibly we could avoid path folding if we served it from the store? >> However this might make path folding less effective... > ...by one hop, and arguably make the attack less effective... by one hop... > right? The attack is to take out the nodes which have the data *in the store*. You mean that they could get a node close to the node which stores the data, and then ask its peers, thus determining whether they store the data? Hmmm, very possibly... >> - Plausible deniability: The bad guys request a key from you. You return >> it. It might be in your store or it might not be; you might have >> forwarded the request, and the attacker can't tell. This is IMHO >> legalistic bull****. It's easy to tell with a timing attack (and >> whatever randomisation we add won't make it much harder, as long as you >> are fetching multiple keys). Also, RejectedOverload{local=false} implies >> it came from downstream, though probing would use minimal HTL so might >> not see that... (But at least *that* is fixable; it is planned to >> generate RejectedOverload{local=false} locally when we are over some >> fraction of our capacity, as part of load management reforms). >> >> The security issues are doubtful IMHO. So it really comes down to >> censorship attacks... The problem with censorship attacks is inserts >> store the key on ~ 3 nodes; it's cached on lots of nodes, but the CHK >> cache turnover is very high; if you take out those 3 nodes, the data >> goes away very quickly. So the classic attack of "fetch it and kill the >> data source" may actually work - if you care about blocking single keys, >> or can kill lots of nodes (e.g. all elements of a splitfile middle layer >> etc). > All the more reason to (somehow) drain caches back into the network stores, > and oddly enough... the *least-requested* may be the most important, in this > case. See above. >> Do we need even more "put it back where it should be" mechanisms? E.g. >> if a sink node finds the data from a store probe it should store it? >> Would this help with censorship attacks? > Wouldn't that require a *huge* lookup table (or bloom filter?!) ala > RecentlyFailed. To quench reinserts for popular keys when they are already running because everyone is requesting the same key and 1/100th of them are reinserting it? That could take some memory, or we could even put it on disk, yes. OTOH it's not really needed judging by the insert:request ratio at the moment? >> The average store turnover is unknown but guesswork suggests it's around >> 2 weeks ... > That sounds quite disagreeable. I'd much rather the original design goal of a > nearly-infinite vat of nearly-permanent storage. :-) Right. But the question is, is the poor data retention simply due to the ratio of new stuff being put in to the size of people's datastores? If so, there's not much we can do, short of bigger storage requirements or even slower inserts. > Do you mean "time until the data is not reachable", or a node's stores > *actually* getting so much data that they are rolling over? The sinks for a given key shouldn't change much. We see ~ 2 weeks data retention for the data being findable. The question is, is this because stores are rolling over quickly (because they are small and have a relatively large number of inserts), or is it because of routing/uptime issues e.g. the sink nodes for the key are all offline, can't be reached due to load problems etc?
This is testable: We just need to probe the average store rollover time!
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Devl mailing list Devl@freenetproject.org https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl