On 29 April 2014 17:54, Stefan Fuhrmann <stefan.fuhrm...@wandisco.com> wrote: > On Mon, Apr 28, 2014 at 8:11 AM, Ivan Zhakov <i...@visualsvn.com> wrote: >> >> eOn 27 April 2014 19:27, <stef...@apache.org> wrote: >> > Author: stefan2 >> > Date: Sun Apr 27 15:27:46 2014 >> > New Revision: 1590405 >> > >> > URL: http://svn.apache.org/r1590405 >> > Log: >> > More 'svn log -g' memory usage reduction. We use a hash to keep track >> > of all revisions reported so far, i.e. easily a million. >> > >> Hi Stefan, >> >> Interesting findings, some comments below. >> >> > That is 48 bytes / rev, allocated in small chunks. The first results >> > in 10s of MB dynamic memory usage while the other results in many 8k >> > blocks being mmap()ed risking reaching the pre-process limit on some >> > systems. >> I don't understand this argument: why small allocations result 10s of >> memory usage? Does not pool allocator aggregates small memory >> allocations to 8k blocks? > > > 1M x 48 bytes = 10s of MB. There are two problems > I'm addressing here for 'svn log -g' (log without -g does > not have those issues): > ack.
> * --limit applies to "top-level" revisions, not the merged ones. > If you log for some integration branch, it may show only a > few top-level revs but, say, 100k merged revs. That is fine > with 1.8 and even more so 1.9 as we deliver the info quickly. > But the server memory usage should remain in check even > for more extreme scenarios / repo sizes. > > * Some system provided APR (1.5+ in particular) uses mmap > to allocate memory. I.e. for every block, e.g. 8k, there is a > separate mmap call. The Linux default is 65530 (sic!) mmap > regions per process. Slowly allocating pools can trigger OOM > errors after only 512MB actual memory usage (sum across > all threads). I already prepared a patch for that. > Ouch, I didn't know that. I was thinking that MMAP APR pool allocator is experimental and is not enabled by default. >> > We introduce a simple packed bit array data structure to replace >> > the hash. For repos < 100M revs, the initialization overhead is less >> > than 1ms and will amortize as soon as more than 1% of all revs are >> > reported. >> > >> >> It may be worth implement the same trick like we done with >> membuffer_cache: use array of bit arrays for every 100k of revisions >> for example and initialize them lazy. I mean: >> [0...99999] - bit array 0 >> [100000....199999] -- bit array 1 >> ... >> >> It should be easy to implement. > > > I gave it a try and it turned out not too horribly complex. > See r1590982. Great! But it may be worth to keep original svn_bit_array and add new svn_sparse_bit_array() with array of svn_bit_array() objects So things will be separated in two micro layers. -- Ivan Zhakov CTO | VisualSVN | http://www.visualsvn.com