Interesting. I tried "git gc --aggressive" on the Mark's converted
repository, and the result is:
netbeans-import/.git$ du -hs .
792M    .

The original was:
netbeans-import.git $ du -hs .
3,5G    .

(IIRC Mark was converting http://hg.netbeans.org/main, not releases, so the
repository is a little bit smaller than the releases one.)

I tried:
$ git log -p | sha1sum

on both repositories, and the hashes appear to be the same. I also tried to
clone the gc-ed repository using git clone --bare --no-local, and the
resulting repository is still about the same size. So, this seems good to
me, unless there is some downside I don't know about.

Jan


On Wed, Nov 23, 2016 at 8:26 PM, Emilian Bold <emilian.b...@gmail.com>
wrote:

> Actually I don't believe the data loss is that large. (There may also be
> mercurial commits that are intentionally ignored by the conversion script,
> like commits that only add tags?)
>
> hg log | grep '^changeset:' | wc -l
>   313209
>
> git log | grep '^commit ' | wc -l
>   301478
>
> So there is a difference of 11731 commits (about 4%) but those couldn't
> have such a large impact on repository size.
>
> I hope somebody else is willing to work with me on this so we document
> everything and do a reproducible repository conversion.
>
>
>
> --emi
>
> On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <emilian.b...@gmail.com>
> wrote:
>
> > Well, I dunno what black magic `gc --aggressive` does but the repository
> > is 0.85GB now!
> >
> > I also ran `git reflog expire` first but it didn't change the size at
> all.
> >
> > One thing to keep in mind is that I used --force although I had 6 commits
> > with the warning "repository has at least one unnamed head". Which were
> > probably all close branch commits (hg commit --close-branch).
> >
> > So I might have have data loss(!) since I believe I read
> hg-fast-export.sh
> > picks only one unnamed head as the migration winner. I wonder if the gc
> > command didn't just purge a lot of valid commits from such an unnamed
> head
> > and that's why the repository became so small.
> >
> > Could somebody else try a test repository conversion and validate my
> > numbers?
> >
> > git gc --aggressive --prune=now
> > Counting objects: 4085031, done.
> > Delta compression using up to 8 threads.
> > Compressing objects: 100% (2909203/2909203), done.
> > Writing objects: 100% (4085031/4085031), done.
> > Total 4085031 (delta 2150468), reused 1585934 (delta 0)
> > Checking connectivity: 4085031, done.
> >
> >
> >
> > --emi
> >
> > On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <paulmer...@apache.org>
> > wrote:
> >
> >> Hi Emilian,
> >>
> >> > I see hg-fast-export.sh finished at some point.
> >> >
> >> > As expected though, git does not have any of the disk space gains. The
> >> > converted git releases/ repository is 3.6GB.
> >>
> >> Just a thought.
> >> Did you try some git cleanups after the conversion?
> >>
> >> git reflog expire --expire=now --all
> >> git gc --aggressive --prune=now
> >>
> >> Cheers
> >>
> >>
> >> > In case these statistics mean something:
> >> >
> >> > git-fast-import statistics:
> >> > ---------------------------------------------------------------------
> >> > Alloc'd objects:    4090000
> >> > Total objects:      4085509 (  40220100 duplicates                  )
> >> >       blobs  :      1036365 (  28386238 duplicates     858087 deltas
> of
> >> > 969684 attempts)
> >> >       trees  :      2735935 (  11833862 duplicates    1370606 deltas
> of
> >> >  2613480 attempts)
> >> >       commits:       313209 (         0 duplicates          0 deltas
> of
> >> >      0 attempts)
> >> >       tags   :            0 (         0 duplicates          0 deltas
> of
> >> >      0 attempts)
> >> > Total branches:        1283 (       346 loads     )
> >> >       marks:        1048576 (    313209 unique    )
> >> >       atoms:         124011
> >> > Memory total:        218429 KiB
> >> >        pools:         26711 KiB
> >> >      objects:        191718 KiB
> >> > ---------------------------------------------------------------------
> >> > pack_report: getpagesize()            =       4096
> >> > pack_report: core.packedGitWindowSize = 1073741824
> >> > pack_report: core.packedGitLimit      = 8589934592
> >> > pack_report: pack_used_ctr            =   39000045
> >> > pack_report: pack_mmap_calls          =     733040
> >> > pack_report: pack_open_windows        =          4 /          7
> >> > pack_report: pack_mapped              = 4280730006 / 6950823920
> >> > ---------------------------------------------------------------------
> >> >
> >> >
> >> > --emi
> >> >
> >> > On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <emilian.b...@gmail.com
> >
> >> > wrote:
> >> >
> >> >> A releases/ clone which on my system takes 3.8GB is reduced to 1.6GB
> >> with
> >> >> the generaldelta and aggressivemergedeltas flags (took about 14
> hours).
> >> >>
> >> >> Pretty impressive!
> >> >>
> >> >> Converting to git with hg-fast-export.sh complains that "repository
> >> has at
> >> >> least one unnamed head" for about 6 revisions. With --force I'm able
> to
> >> >> start the conversion but it hasn't finished yet.
> >> >>
> >> >> The git conversion is about 35% done and already using 1.3GB.
> >> >>
> >> >> So... I assume it's going to need just like the original repository
> >> about
> >> >> 3.8GB.
> >> >>
> >> >> I wonder if git has similar space-saving tricks?
> >> >>
> >> >>
> >> >>
> >> >> --emi
> >> >>
> >> >> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <
> emilian.b...@gmail.com>
> >> >> wrote:
> >> >>
> >> >>> Forgot about this. I've just started the Mercurial repository
> >> conversion
> >> >>> which will take a few hours.
> >> >>>
> >> >>> Will report tomorrow or when it's done.
> >> >>>
> >> >>>
> >> >>> --emi
> >> >>>
> >> >>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <cow...@bbs.darktech.org>
> >> wrote:
> >> >>>
> >> >>>> Hi Emilian,
> >> >>>>
> >> >>>> Any update on this?
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Gili
> >> >>>>
> >> >>>>
> >> >>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e...@gmail.com> wrote:
> >> >>>>> Thank you for following through with this after we talked on IRC.>
> >> >>>>>
> >> >>>>> I will check later the size reduction for the releases/ repo.>
> >> >
> >>
> >
> >
>

Reply via email to