On Mon, Jun 7, 2010 at 8:47 PM,  <gst...@apache.org> wrote:
> Author: gstein
> Date: Tue Jun  8 00:47:22 2010
> New Revision: 952493
>
> URL: http://svn.apache.org/viewvc?rev=952493&view=rev
> Log:
> The query that we used to fetch all children in BASE_NODE and WORKING_NODE
> used a UNION between two SELECT statements. The idea was to have SQLite
> remove all duplicates for us in a single query. Unfortunately, this caused
> SQLite to create an ephemeral (temporary) table and place the results of
> each query into that table. It created an index to remove dupliates. Then
> it returned the values in that ephemeral table. For large numbers of
> nodes, the construction of the table and its index becomes very costly.
>
> This change rebuilds gather_children() in wc_db.c to do the duplicate
> removal manually using a hash table. It does some simple scanning straight
> into an array when it knows duplicates cannot exist (one of BASE or
> WORKING is empty).
>
> The performance problem of svn_wc__db_read_children() was first observed
> in issue #3499. The actual performance improvement is untested so far, but
> I'm assuming pburba can pick up this change and try in his scenario.

On Mon, Jun 7, 2010 at 8:53 PM, Greg Stein <gst...@gmail.com> wrote:
> Hey Paul,
>
> Can you try this change on your large-file-count working copies? I
> believe this should fix the performance problems you were seeing.

Greg,

Short Story: Hours to Seconds

Long Story: This does indeed solve the problems I was seeing:

My test repository was our test suite's Greek tree but with 17,000 1KB
files in a single directory:

Prior to r952493, update and status were taking *quite* some time:

  svn st  01:23:33
  svn up  Gave up after an hour (i.e. lasted longer than my lunch).

With your fix in place, performance improves dramatically:

  svn st  00:00:17
  svn up  00:00:11

Paul

P.S. Thanks!  I was nowhere near figuring this out :-\

Reply via email to