Hello,

Now that I found a free Fossil hosting I can publish the changes I had
been experimenting with:

https://chiselapp.com/user/etanol/repository/fossil-better-import/timeline?r=git-better-import

What began as a way to correctly handle copies and renames, has ended
supporting delta manifests too.  I used two git repositories for
testing: SQLAlchemy [1] and Django [2].

[1] https://github.com/zzzeek/sqlalchemy
[2] https://github.com/django/django

The Django repository turned out to be a surprisingly complete test
subject, as it helped revealin some rename cases such as renaming or
copying a file to an already existing one, or to a file that was deleted
recently.  It also contains some paths with UTF-8 sequences in them,
which are exported as octal escape sequences by Git.

I also took the chance to compare the generated repositories, with and
without using delta manifests.  The command used to convert was:

  git fast-export --all -M -C --signed-tags=strip | fossil --delta --force 
../f-repo.fossil

After converting, each fossil repository was compacted again with:

  fossil rebuild --cluster --compress --vacuum ../f-repo.fossil

The resulting stats for SQLAlchemy, without delta manifests are:

  repository-size:   29233152 bytes (29.2MB)
  artifact-count:    40045 (stored as 1079 full text and 38966 delta blobs)
  artifact-sizes:    39969 average, 776048 max, 1600535301 bytes (1.6GB) total
  compression-ratio: 54:1
  checkins:          9266
  files:             1066 across all branches
  wikipages:         0 (0 changes)
  tickets:           0 (0 changes)
  events:            0
  tagchanges:        99
  project-age:       2973 days or approximately 8.14 years.
  project-id:        06b4b25ccddefde7bfc5723a4c88f5de83281f6f
  fossil-version:    2013-06-18 21:09:23 [c9cb6e7293] [1.26] (gcc-4.8.1)
  sqlite-version:    2013-05-15 18:34:17 [00231fb012] (3.7.17)
  database-stats:    28548 pages, 1024 bytes/pg, 0 free pages, UTF-8, delete 
mode

With delta manifests:

  repository-size:   29512704 bytes (29.5MB)
  artifact-count:    40045 (stored as 1742 full text and 38303 delta blobs)
  artifact-sizes:    35472 average, 776048 max, 1420465350 bytes (1.4GB) total
  compression-ratio: 48:1
  checkins:          9266
  files:             1066 across all branches
  wikipages:         0 (0 changes)
  tickets:           0 (0 changes)
  events:            0
  tagchanges:        99
  project-age:       2973 days or approximately 8.14 years.
  project-id:        9d1199e3635399dcc0e7dc844483721069413cea
  fossil-version:    2013-06-18 21:09:23 [c9cb6e7293] [1.26] (gcc-4.8.1)
  sqlite-version:    2013-05-15 18:34:17 [00231fb012] (3.7.17)
  database-stats:    28821 pages, 1024 bytes/pg, 0 free pages, UTF-8, delete 
mode

Which is not much of an improvement.  However, the Django case is more
impressive:

  repository-size:   92678144 bytes (92.7MB)
  artifact-count:    87593 (stored as 6213 full text and 81380 delta blobs)
  artifact-sizes:    64983 average, 464143 max, 5692046471 bytes (5.7GB) total
  compression-ratio: 61:1
  checkins:          21693
  files:             8153 across all branches
  wikipages:         0 (0 changes)
  tickets:           0 (0 changes)
  events:            0
  tagchanges:        83
  project-age:       2961 days or approximately 8.11 years.
  project-id:        d8e5a69659a85d8abc2405190086a10fca66c6fb
  fossil-version:    2013-06-18 21:09:23 [c9cb6e7293] [1.26] (gcc-4.8.1)
  sqlite-version:    2013-05-15 18:34:17 [00231fb012] (3.7.17)
  database-stats:    90506 pages, 1024 bytes/pg, 0 free pages, UTF-8, delete 
mode

And, with delta manifests:

  repository-size:   93489152 bytes (93.5MB)
  artifact-count:    87593 (stored as 6862 full text and 80731 delta blobs)
  artifact-sizes:    16365 average, 460946 max, 1433448191 bytes (1.4GB) total
  compression-ratio: 15:1
  checkins:          21693
  files:             8153 across all branches
  wikipages:         0 (0 changes)
  tickets:           0 (0 changes)
  events:            0
  tagchanges:        83
  project-age:       2961 days or approximately 8.11 years.
  project-id:        7bd14b77e1396b5851bb1b3c2affaf948f01b9d0
  fossil-version:    2013-06-18 21:09:23 [c9cb6e7293] [1.26] (gcc-4.8.1)
  sqlite-version:    2013-05-15 18:34:17 [00231fb012] (3.7.17)
  database-stats:    91298 pages, 1024 bytes/pg, 0 free pages, UTF-8, delete 
mode

The curiousity here is that when using delta manifests, the compression
ratio is lower.  Furthermore, the disk size of the repository increases
slightly.  Still, the difference in the sum of artifact sizes is huge:
5.7GB vs 1.4GB.

Anybody interested, please test to see if there are more hidden bugs to
be fixed.  But beware that the import process can be lengthy.  The
Django repository took around twenty minutes to be imported.  I
previously tried importing the glibc repository in a machine with 16GB
of RAM but, after a couple of hours, I interrumped the process.

So if any of you is considering importing the Linux kernel repository as
a test, think twice.

Best regards.

-- 
Isaac Jurado

"The noblest pleasure is the joy of understanding."
                                  Leonardo da Vinci
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to