Hello, Now that I found a free Fossil hosting I can publish the changes I had been experimenting with:
https://chiselapp.com/user/etanol/repository/fossil-better-import/timeline?r=git-better-import What began as a way to correctly handle copies and renames, has ended supporting delta manifests too. I used two git repositories for testing: SQLAlchemy [1] and Django [2]. [1] https://github.com/zzzeek/sqlalchemy [2] https://github.com/django/django The Django repository turned out to be a surprisingly complete test subject, as it helped revealin some rename cases such as renaming or copying a file to an already existing one, or to a file that was deleted recently. It also contains some paths with UTF-8 sequences in them, which are exported as octal escape sequences by Git. I also took the chance to compare the generated repositories, with and without using delta manifests. The command used to convert was: git fast-export --all -M -C --signed-tags=strip | fossil --delta --force ../f-repo.fossil After converting, each fossil repository was compacted again with: fossil rebuild --cluster --compress --vacuum ../f-repo.fossil The resulting stats for SQLAlchemy, without delta manifests are: repository-size: 29233152 bytes (29.2MB) artifact-count: 40045 (stored as 1079 full text and 38966 delta blobs) artifact-sizes: 39969 average, 776048 max, 1600535301 bytes (1.6GB) total compression-ratio: 54:1 checkins: 9266 files: 1066 across all branches wikipages: 0 (0 changes) tickets: 0 (0 changes) events: 0 tagchanges: 99 project-age: 2973 days or approximately 8.14 years. project-id: 06b4b25ccddefde7bfc5723a4c88f5de83281f6f fossil-version: 2013-06-18 21:09:23 [c9cb6e7293] [1.26] (gcc-4.8.1) sqlite-version: 2013-05-15 18:34:17 [00231fb012] (3.7.17) database-stats: 28548 pages, 1024 bytes/pg, 0 free pages, UTF-8, delete mode With delta manifests: repository-size: 29512704 bytes (29.5MB) artifact-count: 40045 (stored as 1742 full text and 38303 delta blobs) artifact-sizes: 35472 average, 776048 max, 1420465350 bytes (1.4GB) total compression-ratio: 48:1 checkins: 9266 files: 1066 across all branches wikipages: 0 (0 changes) tickets: 0 (0 changes) events: 0 tagchanges: 99 project-age: 2973 days or approximately 8.14 years. project-id: 9d1199e3635399dcc0e7dc844483721069413cea fossil-version: 2013-06-18 21:09:23 [c9cb6e7293] [1.26] (gcc-4.8.1) sqlite-version: 2013-05-15 18:34:17 [00231fb012] (3.7.17) database-stats: 28821 pages, 1024 bytes/pg, 0 free pages, UTF-8, delete mode Which is not much of an improvement. However, the Django case is more impressive: repository-size: 92678144 bytes (92.7MB) artifact-count: 87593 (stored as 6213 full text and 81380 delta blobs) artifact-sizes: 64983 average, 464143 max, 5692046471 bytes (5.7GB) total compression-ratio: 61:1 checkins: 21693 files: 8153 across all branches wikipages: 0 (0 changes) tickets: 0 (0 changes) events: 0 tagchanges: 83 project-age: 2961 days or approximately 8.11 years. project-id: d8e5a69659a85d8abc2405190086a10fca66c6fb fossil-version: 2013-06-18 21:09:23 [c9cb6e7293] [1.26] (gcc-4.8.1) sqlite-version: 2013-05-15 18:34:17 [00231fb012] (3.7.17) database-stats: 90506 pages, 1024 bytes/pg, 0 free pages, UTF-8, delete mode And, with delta manifests: repository-size: 93489152 bytes (93.5MB) artifact-count: 87593 (stored as 6862 full text and 80731 delta blobs) artifact-sizes: 16365 average, 460946 max, 1433448191 bytes (1.4GB) total compression-ratio: 15:1 checkins: 21693 files: 8153 across all branches wikipages: 0 (0 changes) tickets: 0 (0 changes) events: 0 tagchanges: 83 project-age: 2961 days or approximately 8.11 years. project-id: 7bd14b77e1396b5851bb1b3c2affaf948f01b9d0 fossil-version: 2013-06-18 21:09:23 [c9cb6e7293] [1.26] (gcc-4.8.1) sqlite-version: 2013-05-15 18:34:17 [00231fb012] (3.7.17) database-stats: 91298 pages, 1024 bytes/pg, 0 free pages, UTF-8, delete mode The curiousity here is that when using delta manifests, the compression ratio is lower. Furthermore, the disk size of the repository increases slightly. Still, the difference in the sum of artifact sizes is huge: 5.7GB vs 1.4GB. Anybody interested, please test to see if there are more hidden bugs to be fixed. But beware that the import process can be lengthy. The Django repository took around twenty minutes to be imported. I previously tried importing the glibc repository in a machine with 16GB of RAM but, after a couple of hours, I interrumped the process. So if any of you is considering importing the Linux kernel repository as a test, think twice. Best regards. -- Isaac Jurado "The noblest pleasure is the joy of understanding." Leonardo da Vinci _______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users