Today no less than two people said they were considering contributing to my open source project -- tahoe -- and both of them were surprised at how extremely long it takes to "darcs get http://allmydata.org/ source/tahoe/trunk tahoe". Here is the e-mail from the second one, who suggests switching to git.
Hopefully darcs-2.3 will address this kind of thing? :-/ Regards, Zooko Begin forwarded message: > From: Shawn Willden <[email protected]> > Date: January 7, 2009 17:07:29 PM MST > To: [email protected] > Subject: Re: [tahoe-dev] Thinking about building a P2P backup system > Reply-To: [email protected] > > On Wednesday 07 January 2009 03:09:21 pm zooko wrote: >> So, if you're planning to contribute patches, bug >> reports, documentation, etc., then I'm delighted! > > Well, assuming I can get my head around the codebase sufficiently > in the > snatches of time I have available, I absolutely want to contribute > all of the > above. > >> Have you tried it? It might be just fine for sharing photos. I use >> Tahoe to share photos, but I use the public test grid instead of a >> private grid, and so I'm using many servers located in a co-lo plus a >> handful of random servers operated by Tahoe hackers or curious >> users. It seems to work fine. > > I imagine you also have a pretty fast network connection yourself, > too. Not > to put too much emphasis on my particular case, but I shoot with a > moderately > high-end DSLR so my image files tend to be large, and most of my > family has > low-end DSL connections. At 1 mbps, it takes at least 40 seconds > to download > a 5 MB image file, which would be painfully slow for browsing through > pictures -- and that's if the pipe can be filled. If the images > are coming > from a handful of 256 kbps connections where Tahoe is bandwidth- > capped to use > no more than 100 kbps in order to keep some bandwidth available for > other > stuff (does Tahoe have bandwidth limiting? If not, it probably > needs it), > then the aggregate data stream may be no more than a 400-500 kbps. > > And let's not even talk about HD video. > >> As far as I know, we are doing adequately well on that goal. A few >> times people have asked to have the option to turn off the >> encryption, and in each case I asked them to please measure the >> performance and tell me if the encryption is causing a performance >> problem or another kind of usability problem. > > I'd be shocked if encryption were a performance problem. Crypto > stuff has > been my day job for over a decade, so I'm well aware of how > blisteringly fast > AES is, and RSA isn't too bad as long as you're not doing too much > of it > (especially if you're doing mostly public key ops, not private > key). I'd > expect a lot bigger performance issue from the erasure coding (BTW: > ever > considered Tornado coding instead of Reed-Solomon?). > > However, my real concern isn't CPU usage, particularly since the > heavy lifting > happens during storage, not retrieval. I'm thinking about > bandwidth, both > being able to rsync changes -- important because most home users' net > connections are very asymmetric -- and to avoid hitting the network > at all in > the "Mom browsing my photos" case. > > I'm talking from theory here, not measurements, but I think I can > predict > pretty well what the performance of the sort of network I'm > thinking about > would be. > >> I want Tahoe to offer the user (human or computer) more control and >> more knowledge about which shares go to which storage server. > > Okay, so here's a possibility. If I can ensure that K shares are > stored on my > mom's machine, and if Tahoe is clever enough to use those shares > when she's > browsing those files (doesn't seem difficult), rather than pulling > from the > network, then perhaps browsing my photos will be fast enough. The RS > reconstruction and the decryption shouldn't be a big deal, and > neither should > applying a short sequence of forward deltas. Some performance > testing is in > order. > >> Yes, that's what it currently does (if you chose to share your "added >> convergence secret" with all clients on the backup network). > > Cool. That's probably good enough that the added optimization of > avoiding the > storage of common files completely isn't worth the effort. > >>> To improve this, storage servers could index their local files and >>> note when a request to store a share for a file they possess >>> arrives. > >> By the way, the GNUnet project offers that feature, so you should >> check them out. > > Thanks, I'll take a look. > >>> Next, I want incremental backups and versioning, and I want them to >>> be done bandwidth-efficiently. >> >> Have you seen the duplicity plugin that Francois Deppierraz posted? >> Maybe that does exactly what you want. :-) > > I'll look, but if it works at the tarball level like duplicity, > then no, it's > not what I want. > >> I would prefer if you used Tahoe and contribute patches, and if it >> turns out that there is some behavior that you really want and that >> seems to troublesome to me to risk including it in my codebase, then >> I would prefer that you copy the Tahoe darcs repository and develop >> your own branch. > > Okay. I grabbed the darcs repo (dang is that sloowww! Anybody for > switching > to git? ;-)) and I'll start from there. > > I haven't had a chance to look through the code much yet. Is there an > overview document somewhere that covers the structure? > > Thanks, > > Shawn. _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
