Today no less than two people said they were considering contributing  
to my open source project -- tahoe -- and both of them were surprised  
at how extremely long it takes to "darcs get http://allmydata.org/ 
source/tahoe/trunk tahoe".  Here is the e-mail from the second one,  
who suggests switching to git.

Hopefully darcs-2.3 will address this kind of thing?  :-/

Regards,

Zooko


Begin forwarded message:

> From: Shawn Willden <[email protected]>
> Date: January 7, 2009 17:07:29 PM MST
> To: [email protected]
> Subject: Re: [tahoe-dev] Thinking about building a P2P backup system
> Reply-To: [email protected]
>
> On Wednesday 07 January 2009 03:09:21 pm zooko wrote:
>> So, if you're planning to contribute patches, bug
>> reports, documentation, etc., then I'm delighted!
>
> Well, assuming I can get my head around the codebase sufficiently  
> in the
> snatches of time I have available, I absolutely want to contribute  
> all of the
> above.
>
>> Have you tried it?  It might be just fine for sharing photos.  I use
>> Tahoe to share photos, but I use the public test grid instead of a
>> private grid, and so I'm using many servers located in a co-lo plus a
>> handful of random servers operated by Tahoe hackers or curious
>> users.  It seems to work fine.
>
> I imagine you also have a pretty fast network connection yourself,  
> too.  Not
> to put too much emphasis on my particular case, but I shoot with a  
> moderately
> high-end DSLR so my image files tend to be large, and most of my  
> family has
> low-end DSL connections.  At 1 mbps, it takes at least 40 seconds  
> to download
> a 5 MB image file, which would be painfully slow for browsing through
> pictures -- and that's if the pipe can be filled.  If the images  
> are coming
> from a handful of 256 kbps connections where Tahoe is bandwidth- 
> capped to use
> no more than 100 kbps in order to keep some bandwidth available for  
> other
> stuff (does Tahoe have bandwidth limiting?  If not, it probably  
> needs it),
> then the aggregate data stream may be no more than a 400-500 kbps.
>
> And let's not even talk about HD video.
>
>> As far as I know, we are doing adequately well on that goal.  A few
>> times people have asked to have the option to turn off the
>> encryption, and in each case I asked them to please measure the
>> performance and tell me if the encryption is causing a performance
>> problem or another kind of usability problem.
>
> I'd be shocked if encryption were a performance problem.  Crypto  
> stuff has
> been my day job for over a decade, so I'm well aware of how  
> blisteringly fast
> AES is, and RSA isn't too bad as long as you're not doing too much  
> of it
> (especially if you're doing mostly public key ops, not private  
> key).  I'd
> expect a lot bigger performance issue from the erasure coding (BTW:  
> ever
> considered Tornado coding instead of Reed-Solomon?).
>
> However, my real concern isn't CPU usage, particularly since the  
> heavy lifting
> happens during storage, not retrieval.  I'm thinking about  
> bandwidth, both
> being able to rsync changes -- important because most home users' net
> connections are very asymmetric -- and to avoid hitting the network  
> at all in
> the "Mom browsing my photos" case.
>
> I'm talking from theory here, not measurements, but I think I can  
> predict
> pretty well what the performance of the sort of network I'm  
> thinking about
> would be.
>
>> I want Tahoe to offer the user (human or computer) more control and
>> more knowledge about which shares go to which storage server.
>
> Okay, so here's a possibility.  If I can ensure that K shares are  
> stored on my
> mom's machine, and if Tahoe is clever enough to use those shares  
> when she's
> browsing those files (doesn't seem difficult), rather than pulling  
> from the
> network, then perhaps browsing my photos will be fast enough.  The RS
> reconstruction and the decryption shouldn't be a big deal, and  
> neither should
> applying a short sequence of forward deltas.  Some performance  
> testing is in
> order.
>
>> Yes, that's what it currently does (if you chose to share your "added
>> convergence secret" with all clients on the backup network).
>
> Cool.  That's probably good enough that the added optimization of  
> avoiding the
> storage of common files completely isn't worth the effort.
>
>>> To improve this, storage servers could index their local files and
>>> note when a request to store a share for a file they possess  
>>> arrives.
>
>> By the way, the GNUnet project offers that feature, so you should
>> check them out.
>
> Thanks, I'll take a look.
>
>>> Next, I want incremental backups and versioning, and I want them to
>>> be done bandwidth-efficiently.
>>
>> Have you seen the duplicity plugin that Francois Deppierraz posted?
>> Maybe that does exactly what you want.  :-)
>
> I'll look, but if it works at the tarball level like duplicity,  
> then no, it's
> not what I want.
>
>> I would prefer if you used Tahoe and contribute patches, and if it
>> turns out that there is some behavior that you really want and that
>> seems to troublesome to me to risk including it in my codebase, then
>> I would prefer that you copy the Tahoe darcs repository and develop
>> your own branch.
>
> Okay.  I grabbed the darcs repo (dang is that sloowww! Anybody for  
> switching
> to git? ;-)) and I'll start from there.
>
> I haven't had a chance to look through the code much yet.  Is there an
> overview document somewhere that covers the structure?
>
> Thanks,
>
>       Shawn.
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to