Hi!

Thanks a lot for detailed survey. I'll comment in-between.

Nathan Gray <[email protected]> writes:
> On the (rather large) cap repository, darcs22pre2 used about a third
> more memory and took more than three times as long to get.
Could you please try running time darcs22pre2 get cap2 cap2-prime and report
the user and system times? Also, it would be good to know how many files you
have under the "patches" and "pristine.hashed" subdirectories of _darcs in the
hashed repository.

> For some reason the check and repair benchmarks failed for darcs109,
> but darcs22pre2 took more time and used more memory for these on the
> version 2 repository than on the version 1 repository.  Likewise for
> repair.
That's an interesting observation as well. I don't have an explanation, since
the code doing repair and check is identical for both repository types as far
as I can tell. Oh, wait. I probably know. This is closely related to the above
question about system and user times. Let me elaborate.

Darcs hashed repositories currently store everything directly under a single
directory as a flat list. Ie:

_darcs/pristine.hashed/hash1
_darcs/pristine.hashed/hash2
_darcs/pristine.hashed/hash3

etc.

However, most operating systems and filesystems handle large directories
extremely inefficiently. Although one would expect that this would make no
difference (theoretical bound doesn't change the slightest), in practice the
performance of large directories is orders of magnitude worse.

Coupled with darcs global cache, this can make things real real
bad. Unfortunately for you, that means that we can't improve the performance
for you until a bucketed-hashed repository format is implemented. I will try to
get it rolling for 2.3, but can't make any promises.

> Pulls are somewhat similar when using a version 1 repository, but use
> more than six times as much memory and take about twice as long on the
> version 2 repository.
This is another thing where high-level optimisation could help. In fact, the
same approach that I have used with check and repair would help a lot. You can
see that applying 100 patches takes almost a third of the time that applying
all of the patches in repository during check/repair in darcs22pre2 takes.

[snip numbers]

> On the (somewhat smaller) systems repository, getting used less than a
> tenth as much memory on the version 2 repository, but took five times
> as long.
Again, the big directory issue strikes here. If you have global cache enabled
(likely with darcs22pre2), this is even worse, as you pay even bigger penalty
for every cache access than you pay for accessing repository-local data.

> It took less time to check and to repair using darcs22pre2, but used
> two to three times as much memory as darcs109.
>
> Pulling fewer patches on darcs109 was faster, but pulling more patches
> was faster using darcs22pre2.  Memory usage was similar for all of
> the 'pull' benchmarks.

[snip more numbers]

> I am encouraged that darcs22pre2 on a version 2 repository is performing so
> much better than earlier versions of darcs2.  I am still concerned that it
> uses so much more memory for check and repair, and sometimes pull, and that
> it is so much slower for pulls and gets.
One part of the high memory usage is that we now have a limit on how much of
changed file contents is retained in memory. This is currently hard-coded as
100M. It still doesn't explain the 200+ megabytes we are seeing there
though. It wouldn't be hard to reduce memory usage by say 50M on expense of
slower check/repair. This is possibly something to be fine-tuned.

Once some high-level optimisations are applied (hopefully in time for darcs
2.3), (local) get and pull performance should improve significantly.

Yours,
   Petr.

-- 
Peter Rockai | me()mornfall!net | prockai()redhat!com
 http://blog.mornfall.net | http://web.mornfall.net

"In My Egotistical Opinion, most people's C programs should be
 indented six feet downward and covered with dirt."
     -- Blair P. Houghton on the subject of C program indentation
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to