eOn 27 April 2014 19:27,  <stef...@apache.org> wrote:
> Author: stefan2
> Date: Sun Apr 27 15:27:46 2014
> New Revision: 1590405
>
> URL: http://svn.apache.org/r1590405
> Log:
> More 'svn log -g' memory usage reduction.  We use a hash to keep track
> of all revisions reported so far, i.e. easily a million.
>
Hi Stefan,

Interesting findings, some comments below.

> That is 48 bytes / rev, allocated in small chunks.  The first results
> in 10s of MB dynamic memory usage while the other results in many 8k
> blocks being mmap()ed risking reaching the pre-process limit on some
> systems.
I don't understand this argument: why small allocations result 10s of
memory usage? Does not pool allocator aggregates small memory
allocations to 8k blocks?

>
> We introduce a simple packed bit array data structure to replace
> the hash.  For repos < 100M revs, the initialization overhead is less
> than 1ms and will amortize as soon as more than 1% of all revs are
> reported.
>

It may be worth implement the same trick like we done with
membuffer_cache: use array of bit arrays for every 100k of revisions
for example and initialize them lazy. I mean:
[0...99999] - bit array 0
[100000....199999] -- bit array 1
...

It should be easy to implement.

This also improves cases for repositories like ASF: there are many
revisions, but usually only most recent revisions are accessed.

What do you think?

-- 
Ivan Zhakov
CTO | VisualSVN | http://www.visualsvn.com

Reply via email to