Re: [darcs-users] GSoC: network optimisation vs cache vs library?

Max Battcher Wed, 14 Apr 2010 17:18:45 -0700

On 4/14/2010 19:23, Zooko Wilcox-O'Hearn wrote:

Our project web site was just down for about an hour and a half a couple
of hours ago. The reason turned out to be that there were about a dozen
darcs processes running trying to answer queries like this:


darcs query contents --quiet --match "hash
20080103234853-92b7f-966e01e6a40dbe94209229f459988e9dea37013a.gz"
"docs/running.html"

This is the query that the trac-darcs plugin issues when you hit this
web page:

http://tahoe-lafs.org/trac/tahoe-lafs/changeset/1782/docs/running.html

That particular query when run in isolation (i.e. not concurrently with
dozens of other queries) takes at least 20 seconds, and about 59 MB of RAM.

Enough of these outstanding queries had piled up that the server ran out
of RAM and stopped serving our trac instance or allowing ssh access for
about an hour and a half.

All of which goes to show that Trac+darcs still isn't well optimized forcaching darcs queries or dealing gracefully with with long runningcommand invocations... I still say the Trac reliance on CVS/SVN-stylerevision numbers means that Trac is absolutely not well-adapted forserving darcs repositories. It may be "revision 1782" to Trac, but 'showcontents --match "hash 2008..."' is "commute this file to how it wouldappear if only the patches preceding or equal to this one with atimestamp from two years ago were applied" to darcs. (Which ends upbeing quite possibly not a "real" historic version at all, and whichdoes quite a bit of work to be so easily susceptible tocrawlers/DDoS/accidental DDoS...)

20secs doesn't sound unreasonable from the point of view that you areasking darcs to create an entire new "version" of a file. While I expectthere is plenty of performance left to squeeze from this, I don't thinka query like this one will ever near git/svn/... historic revisionlookup, because this is an entirely different beast. It doesn't makesense for me for Trac to rely on it for common queries.

Maybe you should sponsor someone to work on "web scalability" for you.For instance, a bit of AJAXy "long-running process" support ("Pleasewait while this ahistoric version is fetched...") and a basic task queue(RabbitMQ, Amazon SQS, whatever) to keep the server from biting off morethan it can chew at any given point... (Or even spreading about thecache generation misery to more than one server. Queues are very usefulthat way.)

Forgive my petulance, but it seems to me fairly odd to me that forsomeone working on a project for decentralized, scalable data storageyou seem fairly blind to web scalability issues when it comes toTrac+Darcs...


--
--Max Battcher--
http://worldmaker.net
_______________________________________________
darcs-users mailing list
darcs-users@darcs.net
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] GSoC: network optimisation vs cache vs library?

Reply via email to