eOn 27 April 2014 19:27, <stef...@apache.org> wrote: > Author: stefan2 > Date: Sun Apr 27 15:27:46 2014 > New Revision: 1590405 > > URL: http://svn.apache.org/r1590405 > Log: > More 'svn log -g' memory usage reduction. We use a hash to keep track > of all revisions reported so far, i.e. easily a million. > Hi Stefan,
Interesting findings, some comments below. > That is 48 bytes / rev, allocated in small chunks. The first results > in 10s of MB dynamic memory usage while the other results in many 8k > blocks being mmap()ed risking reaching the pre-process limit on some > systems. I don't understand this argument: why small allocations result 10s of memory usage? Does not pool allocator aggregates small memory allocations to 8k blocks? > > We introduce a simple packed bit array data structure to replace > the hash. For repos < 100M revs, the initialization overhead is less > than 1ms and will amortize as soon as more than 1% of all revs are > reported. > It may be worth implement the same trick like we done with membuffer_cache: use array of bit arrays for every 100k of revisions for example and initialize them lazy. I mean: [0...99999] - bit array 0 [100000....199999] -- bit array 1 ... It should be easy to implement. This also improves cases for repositories like ASF: there are many revisions, but usually only most recent revisions are accessed. What do you think? -- Ivan Zhakov CTO | VisualSVN | http://www.visualsvn.com