On Mon, Jun 7, 2010 at 8:47 PM, <gst...@apache.org> wrote: > Author: gstein > Date: Tue Jun 8 00:47:22 2010 > New Revision: 952493 > > URL: http://svn.apache.org/viewvc?rev=952493&view=rev > Log: > The query that we used to fetch all children in BASE_NODE and WORKING_NODE > used a UNION between two SELECT statements. The idea was to have SQLite > remove all duplicates for us in a single query. Unfortunately, this caused > SQLite to create an ephemeral (temporary) table and place the results of > each query into that table. It created an index to remove dupliates. Then > it returned the values in that ephemeral table. For large numbers of > nodes, the construction of the table and its index becomes very costly. > > This change rebuilds gather_children() in wc_db.c to do the duplicate > removal manually using a hash table. It does some simple scanning straight > into an array when it knows duplicates cannot exist (one of BASE or > WORKING is empty). > > The performance problem of svn_wc__db_read_children() was first observed > in issue #3499. The actual performance improvement is untested so far, but > I'm assuming pburba can pick up this change and try in his scenario.
On Mon, Jun 7, 2010 at 8:53 PM, Greg Stein <gst...@gmail.com> wrote: > Hey Paul, > > Can you try this change on your large-file-count working copies? I > believe this should fix the performance problems you were seeing. Greg, Short Story: Hours to Seconds Long Story: This does indeed solve the problems I was seeing: My test repository was our test suite's Greek tree but with 17,000 1KB files in a single directory: Prior to r952493, update and status were taking *quite* some time: svn st 01:23:33 svn up Gave up after an hour (i.e. lasted longer than my lunch). With your fix in place, performance improves dramatically: svn st 00:00:17 svn up 00:00:11 Paul P.S. Thanks! I was nowhere near figuring this out :-\