Hello Eric, thanks for looking into it.

>> git-cat-file has outgrown the parent perl process several times
>> (git-cat-file - ~3-4Gb, perl - 400Mb).

> Ugh, that sucks.
> Even the 400Mb size of Perl annoys me greatly and I'd work
> on fixing it if I had more time.

I was going to look at this problem also, but first I'd like to improve the 
situation with cat-file as on large repos it is larger problem. By the way, 
what direction would you suggest to begin with?

> A few more questions:

> * What is the largest file that existed in that repo?

About 2.5M

> * Did you try "MALLOC_MMAP_THRESHOLD_" with glibc?

Have just tried it on a smaller repo (which takes about 1 hour to clone and RSS 
grows from 4M to 40M during the process. Unfortunately there is no much of an 
effect: max RSS is 41M with default settings and 38M with 
MALLOC_MMAP_THRESHOLD_=131072.

> If alloc.c is the culprit, I would consider to transparently restart
"cat-file --batch" once it grows to a certain size or after a certain
number of requests are made to it.

alloc.c interface is not used in cat-file at all, only direct calls to xmalloc 
and xrealloc from wrapper.c, and also xmmap() from sha1_file.c.

> > git-cat-file has outgrown the parent perl process several times
> > (git-cat-file - ~3-4Gb, perl - 400Mb).

> How much of that is anonymous memory, though?

Haven't measured on this particular repo: didn't redo the 2 week experiment =) 
However I checked on a smaller test repo and anon memory is about 12M out of 
40M total. Most of memory is really taken by mmaped *.pack and *idx files.

Actually I accidentally found out that if I export GIT_MALLOC_LIMIT variable 
set to several megabytes it has the following effect:
 * git-svn.perl launches git-gc
 * git-gc can't allocate enough memory and thus doesn't create any pack files
 * git-cat-file works only with pure blob object, not packs, and it's memory 
usage doesn't grow larger than 4-5M

It gave me a thought that maybe we could get rid of "git gc" calls after each 
commit in perl code and just perform one large gc operation at the end. It will 
cost disk space during clone but save us memory. What do you think?

As for your suggestion regarding periodic restart of batch process inside 
git-cat-file, I think we could give it a try, I can prepare a patch and run 
some tests.

--
Best Regards,
Victor
________________________________________
From: Eric Wong [normalper...@yhbt.net]
Sent: Tuesday, September 22, 2015 5:35 PM
To: Victor Leschuk
Cc: Junio C Hamano; git@vger.kernel.org
Subject: Re: [PATCH] git-svn: make batch mode optional for git-cat-file

Eric Wong <normalper...@yhbt.net> wrote:
> Victor Leschuk <vlesc...@accesssoftek.com> wrote:
> > The thing is that git-cat-file keeps growing during work when running
> > in "batch" mode. See the figure attached: it is for cloning a rather
> > small repo (1 hour to clone about ~14000 revisions). However the clone
> > of a large repo (~280000 revisions) took about 2 weeks and
> > git-cat-file has outgrown the parent perl process several times
> > (git-cat-file - ~3-4Gb, perl - 400Mb).

How much of that is anonymous memory, though?
(pmap $PID_OF_GIT_CAT_FILE)

Running the following on the Linux kernel tree I had lying around:

(for i in $(seq 100 200); do git ls-files | sed -e "s/^/HEAD~$i:/"; done)|\
  git cat-file --batch >/dev/null

Reveals about 510M RSS in top, but pmap says less than 20M of that
is anonymous.  So the rest are mmap-ed packfiles; that RSS gets
transparently released back to the kernel under memory pressure.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to