On Thu, Jan 08, 2009 at 05:40:38PM +0100, Petr Rockai wrote:
> Nathan Gray <[email protected]> writes:
> > On the (rather large) cap repository, darcs22pre2 used about a third
> > more memory and took more than three times as long to get.
>
> Could you please try running time darcs22pre2 get cap2 cap2-prime and report
> the user and system times? Also, it would be good to know how many files you
> have under the "patches" and "pristine.hashed" subdirectories of _darcs in the
> hashed repository.

$ time darcs22pre2 get cap2 cap2-prime
Copying patches, to get lazy repository hit ctrl-C...
Finished getting.

real    3m0.111s
user    2m0.744s
sys     0m25.230s

$ ls cap2/_darcs/patches/| wc -l
26411

$ ls cap2/_darcs/pristine.hashed/| wc -l
59286

> > For some reason the check and repair benchmarks failed for darcs109,
> > but darcs22pre2 took more time and used more memory for these on the
> > version 2 repository than on the version 1 repository.  Likewise for
> > repair.
>
> That's an interesting observation as well. I don't have an explanation, since
> the code doing repair and check is identical for both repository types as far
> as I can tell. Oh, wait. I probably know. This is closely related to the above
> question about system and user times. Let me elaborate.
> 
> Darcs hashed repositories currently store everything directly under a single
> directory as a flat list. Ie:
> 
> _darcs/pristine.hashed/hash1
> _darcs/pristine.hashed/hash2
> _darcs/pristine.hashed/hash3
> 
> etc.
> 
> However, most operating systems and filesystems handle large directories
> extremely inefficiently. Although one would expect that this would make no
> difference (theoretical bound doesn't change the slightest), in practice the
> performance of large directories is orders of magnitude worse.

That is a reasonable explanation, especially considering that there
appear to be almost 60K files in the repository.

> Coupled with darcs global cache, this can make things real real
> bad. Unfortunately for you, that means that we can't improve the performance
> for you until a bucketed-hashed repository format is implemented. I will try 
> to
> get it rolling for 2.3, but can't make any promises.

I will try to remember to disable global cache until bucketed-hashed
structure is implemented.

[snip]

> > I am encouraged that darcs22pre2 on a version 2 repository is performing so
> > much better than earlier versions of darcs2.  I am still concerned that it
> > uses so much more memory for check and repair, and sometimes pull, and that
> > it is so much slower for pulls and gets.
>
> One part of the high memory usage is that we now have a limit on how much of
> changed file contents is retained in memory. This is currently hard-coded as
> 100M. It still doesn't explain the 200+ megabytes we are seeing there
> though. It wouldn't be hard to reduce memory usage by say 50M on expense of
> slower check/repair. This is possibly something to be fine-tuned.

Let me know if there are some other tests you want me to run in
regards to the high memory usage.

> Once some high-level optimisations are applied (hopefully in time for darcs
> 2.3), (local) get and pull performance should improve significantly.

I am glad that my input is helpful.  I am excited that the project
continues to move forward and that my needs appear to be on the radar.

-kolibrie

Attachment: signature.asc
Description: Digital signature

_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to