Hi there, For those of you that are facts and numbers crazy, I attached some data size info for 3 large FSFS repositories. They are 1.8-format mirrors of the Apache, KDE and wordpress repositories. I used my new fsfs-stats tool to extract the info.
Some of my findings: * Apache: lots of large zip files added lately (low overall compression rate but tool does not list zip files etc. as the reason - yet) * KDE: still larger then Apache with an excellent compression ratio (lots of large .po files); >1TB * Wordpress: directory compression eliminated directory storage overhead (5000% => <10%) * rep sharing is most effective when you have many "casual" users (> factor 2 in wordpress; 25% savings for Apache; insignificant for KDE since po files are not shared / identical between branches) * noderevs + changes list takes up 10..30% of the total repo size, i.e. actual content already well compressed * more different file props reps than I thought (probably due to per-file old merge info) * >50% of all nodes in Apache repo have props * rep sharing + deltification brings prop info down to ~10 bytes / rev for Apache -- Stefan^2. -- Certified & Supported Apache Subversion Downloads: * http://www.wandisco.com/subversion/download *
Global statistics: 43,571,200,544 bytes in 1,407,978 revisions 1,719,438,790 bytes in 11,919,461 changes 8,527,042,341 bytes in 28,631,286 node revision records 32,606,404,032 bytes in 26,042,259 representations 175,991,665,585 bytes expanded representation size 232,589,088,405 bytes with rep-sharing off Noderev statistics: 8,527,042,341 bytes in 28,631,286 nodes total 4,529,280,752 bytes in 18,195,547 directory noderevs 3,997,761,589 bytes in 10,435,739 file noderevs Representation statistics: 32,606,404,032 bytes in 26,042,259 representations total 1,206,577,410 bytes in 17,999,442 directory representations 31,386,080,727 bytes in 7,866,975 file representations 7,936,967 bytes in 102,123 directory property representations 5,808,928 bytes in 73,719 file property representations 703,824,567 bytes in header & footer overhead Directory representation statistics: 1,206,577,410 bytes in 17,999,442 reps 7,198,044 bytes in 76,251 shared reps 14,900,076,043 bytes expanded size 54,380,469 bytes expanded shared size 15,067,384,452 bytes with rep-sharing off 140,449 shared references File representation statistics: 31,386,080,727 bytes in 7,866,975 reps 6,957,017,837 bytes in 1,308,907 shared reps 160,724,606,881 bytes expanded size 26,699,217,946 bytes expanded shared size 215,992,591,222 bytes with rep-sharing off 2,568,681 shared references Directory property representation statistics: 7,936,967 bytes in 102,123 reps 2,435,475 bytes in 30,208 shared reps 236,898,639 bytes expanded size 48,224,988 bytes expanded shared size 959,652,575 bytes with rep-sharing off 3,267,341 shared references File property representation statistics: 5,808,928 bytes in 73,719 reps 691,141 bytes in 8,936 shared reps 130,084,022 bytes expanded size 4,241,945 bytes expanded shared size 569,460,156 bytes with rep-sharing off 6,554,789 shared references
Global statistics: 42,516,758,377 bytes in 1,325,037 revisions 2,112,852,964 bytes in 18,163,503 changes 9,918,750,627 bytes in 31,461,675 node revision records 29,614,818,603 bytes in 29,269,280 representations 1,114,881,994,595 bytes expanded representation size 1,155,846,558,984 bytes with rep-sharing off Noderev statistics: 9,918,750,627 bytes in 31,461,675 nodes total 3,641,226,857 bytes in 14,233,846 directory noderevs 6,277,523,770 bytes in 17,227,829 file noderevs Representation statistics: 29,614,818,603 bytes in 29,269,280 representations total 1,411,801,736 bytes in 14,143,671 directory representations 28,200,181,907 bytes in 15,087,277 file representations 1,465,071 bytes in 17,885 directory property representations 1,369,889 bytes in 20,447 file property representations 856,408,582 bytes in header & footer overhead Directory representation statistics: 1,411,801,736 bytes in 14,143,671 reps 5,670,142 bytes in 51,339 shared reps 26,884,721,654 bytes expanded size 61,486,365 bytes expanded shared size 26,955,905,794 bytes with rep-sharing off 63,390 shared references File representation statistics: 28,200,181,907 bytes in 15,087,277 reps 3,087,013,223 bytes in 1,136,350 shared reps 1,087,898,597,508 bytes expanded size 23,485,645,700 bytes expanded shared size 1,126,563,329,834 bytes with rep-sharing off 2,140,551 shared references Directory property representation statistics: 1,465,071 bytes in 17,885 reps 782,037 bytes in 8,669 shared reps 93,340,801 bytes expanded size 30,873,623 bytes expanded shared size 1,374,010,811 bytes with rep-sharing off 8,070,095 shared references File property representation statistics: 1,369,889 bytes in 20,447 reps 188,512 bytes in 3,028 shared reps 5,334,632 bytes expanded size 855,812 bytes expanded shared size 953,312,545 bytes with rep-sharing off 9,041,782 shared references
Global statistics: 8,233,212,081 bytes in 507,189 revisions 336,363,580 bytes in 3,473,008 changes 1,205,197,688 bytes in 5,125,527 node revision records 6,610,608,683 bytes in 3,175,300 representations 416,559,053,291 bytes expanded representation size 440,976,526,859 bytes with rep-sharing off Noderev statistics: 1,205,197,688 bytes in 5,125,527 nodes total 403,048,125 bytes in 2,263,745 directory noderevs 802,149,563 bytes in 2,861,782 file noderevs Representation statistics: 6,610,608,683 bytes in 3,175,300 representations total 428,471,684 bytes in 2,111,717 directory representations 6,181,996,505 bytes in 1,061,535 file representations 116,243 bytes in 1,742 directory property representations 24,251 bytes in 306 file property representations 75,980,107 bytes in header & footer overhead Directory representation statistics: 428,471,684 bytes in 2,111,717 reps 5,577,314 bytes in 36,636 shared reps 398,462,596,403 bytes expanded size 79,861,877 bytes expanded shared size 398,549,277,881 bytes with rep-sharing off 42,953 shared references File representation statistics: 6,181,996,505 bytes in 1,061,535 reps 3,029,368,482 bytes in 446,128 shared reps 18,096,237,254 bytes expanded size 7,064,016,710 bytes expanded shared size 42,360,997,646 bytes with rep-sharing off 1,800,236 shared references Directory property representation statistics: 116,243 bytes in 1,742 reps 78,252 bytes in 1,100 shared reps 193,351 bytes expanded size 106,096 bytes expanded shared size 4,082,036 bytes with rep-sharing off 68,921 shared references File property representation statistics: 24,251 bytes in 306 reps 18,453 bytes in 239 shared reps 26,283 bytes expanded size 18,931 bytes expanded shared size 62,169,296 bytes with rep-sharing off 1,213,859 shared references