[freenet-dev] Insert on demand was Re: Uservoice update

Matthew Toseland Thu, 21 Jan 2010 13:02:45 +0000

On Wednesday 20 January 2010 16:25:18 alex wrote:
> Matthew Toseland wrote:
> 
> > On Wednesday 20 January 2010 10:37:12 alex wrote:
> >> Matthew Toseland wrote:
> >> 
> >> > 2) Write a killer file-sharing application (232 votes)
> >> > 
> >> > IMHO there are a number of issues here:
> >> > - Current data persistence sucks. We are working on it and need to do
> >> > more work on it. - It is relatively hard to search for files. We need a
> >> > good WoT-based spamproof fast file search system. - General demand from
> >> > many folks for insert on demand. Maybe it worked on Frost, maybe it
> >> > didn't, but Frost is dead. The main advantages of insert on demand are
> >> > 1) that you can "share" a directory immediately, 2) you don't need to
> >> > wait for the insert, 3) the set of available data, or available data at
> >> > a given speed, is larger. I am skeptical, there are several problems
> >> > with it (trust, security, different views of "network pollution" ...),
> >> > but in any case improving data persistence and making it easy to search
> >> > for files are both important, and once we have these we can consider
> >> > insert on demand. And there *are* options for relatively safe reinsert
> >> > on demand, although they are messy...
> >> > 
> >> > Or am I being unduly dogmatic and long-termist here? Maybe a simple
> >> > WoT-based file search and share and reinsert on demand system would
> >> > gain us so many more users that we should do it anyway, even if insert
> >> > on demand is risky and not necessary long term?
> >> > 
> >> > This does rely on distributed searching, and it could be a while ...
> >> 
> >> This is only MHO, but I really think that freenet could take off on user
> >> numbers if it were an out-of-the-box secure file-sharing replacement.
> >> It's a matter of time that some app will fill the gap that will appear,
> >> even if less private alternatives are nowadays doing well.
> > 
> > "Out of the box" is a challenge. Any sort of distributed search is not
> > going to work instantly. However, Perfect Dark takes a couple of hours to
> > download the indexes, so maybe it doesn't have to? Opennet needs to work
> > well out of the box and we need to do more work on that: Dealing with low
> > uptime well is important. The fact that you can insert a popular freesite
> > from a cafe with wifi is a major benefit in terms of features and use
> > cases, we should play to it.
> 
> Agreed. I was in fact just thinking of "it works without fiddling like crazy 
> with two extra apps, micromanaging all day the WoT, etc". Basically, 
> install, wait (whatever time is reasonably), and use.


Agreed, working out of the box is an important goal. And it's feasible for 
reading stuff. For writing stuff you're gonna have to do captchas though, and 
occasionally change trust levels, hopefully from messages. :|
> 
> >> Backing this position are emergent projects aiming at providing shared
> >> global hard-disks, like omemo, wuala and others (you could even include
> >> dropbox and the myriad of similar ones here, although their sharing
> >> capabilities are not (I think) intended for untrusted people groups). In
> >> a way they're doing what freenet does in providing free "cloud" storage.
> > 
> > Wuala uses 517% FEC redundancy plus central backup servers. IMHO we will
> > need something like that level of redundancy (although take into account
> > that we have some block level).
> > 
> > No idea about omemo, beyond what wikipedia says: it's an open source
> > storage DHT with randomisation.
> 
> Yep, I'm not familiar either about the internals, but the idea is that you 
> get a new network drive where tags are somehow translated into folders. And 
> this drive is shared by all people. It never left early alpha state though, 
> and issues like spam were never clear.
> 
> I'm not sure if the legal process has killed development or it will continue 
> eventually.
> 
> In any case, I don't mention these projects to point to the specific 
> technical details/shortcomings/advantages in comparison to freenet; I see 
> them as a sign that there's a desire for such an anonymous global cloud 
> storage.

It's a good idea, so people keep coming back to it.
> 
> >> The case of omemo is of particular interest to me, not only because his
> >> creator is a compatriot, long time prominent figure in the p2p scene, and
> >> awaiting sentence for *creating* a p2p application (actually the charges
> >> are unfair competition, worth 13e6$), but because his programming
> >> trajectory follows what I see as the dominating trend in p2p generations,
> >> in going from mp2p to omemo.
> > 
> > Ooops!
> >> 
> >> So better be there leading than lose the train to some not-so-secure
> >> alternative.
> > 
> > Not so secure alternatives will be faster for a long time, and may have
> > commercial muscle behind them and therefore better UIs. We will not be
> > there before them. But we may eventually outcompete them for some markets
> > - people who really care about privacy, for example.
> 
> Right. There's always the story about being there with something good 
> enough, or later with a "perfect" product.

Good enough is hard for Freenet. Making Freenet fast is hard.
> 
> Definitely, the freenet aim at hardcore privacy makes it the only choice for 
> a target group of people. It's only that I fear that hardcore privacy may 
> become a necessity in all p2p, and then freenet will not have any advantage 
> over others. But this is still a far situation, I hope.

Well then the others have a long way to go!
> 
> >> That said, I'm with you in that several pieces are needed, and all of
> >> them require lots of work (even in theoretical aspects).
> >> 
> >> a) Data persistence
> > 
> > Lots of work needs to be done here. Much has already been done. And this
> > is data persistence in the broadest sense - it incorporates how fast you
> > can find rare or old data as well as whether you can find it at all.
> > 
> >> b) Distributed scalable spam-resistant indexing (!)
> > 
> > This is hard. I will try to explain the technical issues.
> > 
> > infinity0's main, more or less complete contribution, has been the new
> > index format. This is essentially an on-Freenet btree (a standard
> > structure used in databases), which means it scales to any size without
> > any of the parts getting enormous. Whereas the current index format has
> > problems when it gets big.
> > 
> > The first serious application of it will be to replace the current spider
> > format, and write the data on the fly while indexing rather than having to
> > wait for a week while the XML format data is written. It will also include
> > the actual insert to Freenet, unlike the current code, will support
> > updating an on-Freenet index without reinserting everything, and generally
> > will be a huge gain.
> > 
> > It is necessary to have a scalable structure for two main reasons:
> > 1) Searching freesites more or less requires a spider. This will put out
> > huge indexes. 2) Distributed searching will likely involve some users
> > aggregating other users' indexes, so even for file searching, big indexes
> > will be needed eventually.
> > 
> > Now, for full distributed searching, we need more work. Most of that
> > (Interdex) has been specified, planned and designed in some detail by
> > infinity0, but it will not be ready in the near future as he has no time.
> > It is conceivable that others might finish it. Basically it will be a
> > WoT-based search system where each user publishes an index, and also links
> > to other people's indexes. It will be relatively slow, because it involves
> > fetching lots of data, but we can preload, like Perfect Dark does.
> > 
> > And then you have scaling issues with the WoT based search itself.
> > Although I am of the view that scaling issues on Freenet are not quite the
> > same as you might expect: Fetching popular data is easy and produces very
> > little network load. However if the set of indexes grows very large, life
> > gets difficult. This is a problem for centralised searching (like the
> > current freesite indexes) and even more so for decentralised searching...
> 
> Thanks for summarizing the issues. 
> 
> I see all these difficulties, and then some. For example, cooperative index 
> building. Can this be completely replaced by aggregated indexes relying on a 
> WoT? I say this because I see a basic difference at work here. In freenet, I 
> understand that a btree is inserted under some key, and thus can be authored 
> by a single "entity" (or collection of trusted peers). However, in 
> traditional p2p, distributed hash tables (e.g. kademlia) are built by all 
> peers. Or I'm wrong here about how btrees will work in freenet?
> 
> I'm being perhaps overly concerned, and aggregation of trusted, individual 
> indexes, is as good or better than a huge hashtable. I don't really know.

It is better because we cannot create a big hashtable without being vulnerable 
to spam and DoS attacks. Other networks with different privacy models might be 
able to create a keyword DHT with spam resistance, but we can't as far as 
anyone can tell.
> 
> >> c) Insertion on demand.
> >> 
> >> Also like you I wonder if it's better to get a working something ASAP and
> >> refine it later or what. The frost system death should be a warning,
> >> but...
> > 
> > It's a question of what you mean by "working". Once we have any form of
> > distributed search, we will have something that works, after a fashion,
> > even if we do not have insert on demand. We could then implement a quick
> > hack for insert on demand, but I'm not sure whether that is the right
> > approach given serious long-term issues with insert on demand which I am
> > not sure how to solve...
> > 
> > There are several serious problems with insert on demand:
> > 
> > 1) Insert time security. If the attacker knows what data you are going to
> > insert in advance, it makes some very serious attacks *much* easier. 2)
> > Uptime issues. The data will only be inserted if somebody who has it is
> > online. Thus it can take days, and response time patterns of inserters may
> > give away their location etc. 
> 
> > 3) DoS issues. How do you prevent bogus
> > requests for data in an automated insert on demand system? Maybe WoT will
> > help, but it's early days. 
> 
> Well, the worst that may happen is that you get requests for everything you 
> have, and then you're at the scenario where you point to in 4).

Right. In which case what was the point in the first place?
> 
> However, since a node has a maximum insertion rate, I don't see here a big 
> issue, as long as insertion priorities are working.

The point is that DoS attacks make insert on demand pointless.
> 
> > 4) Capacity. IMHO if Freenet is working well we
> > should not need insert on demand: Its capacity should be much greater than
> > it is now, and we should be able to just insert and fetch the data.
> 
> In this regard, there's an advantage about on-demand insertion, which is 
> that perhaps you'll end never inserting a huge part of your shared files. 
> Given the time it takes for insertion, I see this as one of the biggest 
> gains.

Maybe. But if you did insert it it would be accessible immediately to people 
who click the file - if Freenet is working.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20100121/3ce6c6d8/attachment.pgp>

[freenet-dev] Insert on demand was Re: Uservoice update

Reply via email to