On Thu, Jan 08, 2009 at 05:40:38PM +0100, Petr Rockai wrote: > Nathan Gray <[email protected]> writes: > > On the (rather large) cap repository, darcs22pre2 used about a third > > more memory and took more than three times as long to get. > > Could you please try running time darcs22pre2 get cap2 cap2-prime and report > the user and system times? Also, it would be good to know how many files you > have under the "patches" and "pristine.hashed" subdirectories of _darcs in the > hashed repository.
$ time darcs22pre2 get cap2 cap2-prime Copying patches, to get lazy repository hit ctrl-C... Finished getting. real 3m0.111s user 2m0.744s sys 0m25.230s $ ls cap2/_darcs/patches/| wc -l 26411 $ ls cap2/_darcs/pristine.hashed/| wc -l 59286 > > For some reason the check and repair benchmarks failed for darcs109, > > but darcs22pre2 took more time and used more memory for these on the > > version 2 repository than on the version 1 repository. Likewise for > > repair. > > That's an interesting observation as well. I don't have an explanation, since > the code doing repair and check is identical for both repository types as far > as I can tell. Oh, wait. I probably know. This is closely related to the above > question about system and user times. Let me elaborate. > > Darcs hashed repositories currently store everything directly under a single > directory as a flat list. Ie: > > _darcs/pristine.hashed/hash1 > _darcs/pristine.hashed/hash2 > _darcs/pristine.hashed/hash3 > > etc. > > However, most operating systems and filesystems handle large directories > extremely inefficiently. Although one would expect that this would make no > difference (theoretical bound doesn't change the slightest), in practice the > performance of large directories is orders of magnitude worse. That is a reasonable explanation, especially considering that there appear to be almost 60K files in the repository. > Coupled with darcs global cache, this can make things real real > bad. Unfortunately for you, that means that we can't improve the performance > for you until a bucketed-hashed repository format is implemented. I will try > to > get it rolling for 2.3, but can't make any promises. I will try to remember to disable global cache until bucketed-hashed structure is implemented. [snip] > > I am encouraged that darcs22pre2 on a version 2 repository is performing so > > much better than earlier versions of darcs2. I am still concerned that it > > uses so much more memory for check and repair, and sometimes pull, and that > > it is so much slower for pulls and gets. > > One part of the high memory usage is that we now have a limit on how much of > changed file contents is retained in memory. This is currently hard-coded as > 100M. It still doesn't explain the 200+ megabytes we are seeing there > though. It wouldn't be hard to reduce memory usage by say 50M on expense of > slower check/repair. This is possibly something to be fine-tuned. Let me know if there are some other tests you want me to run in regards to the high memory usage. > Once some high-level optimisations are applied (hopefully in time for darcs > 2.3), (local) get and pull performance should improve significantly. I am glad that my input is helpful. I am excited that the project continues to move forward and that my needs appear to be on the radar. -kolibrie
signature.asc
Description: Digital signature
_______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
