Ben Pollack's essay at http://bitquabit.com/post/unorthodocs-abandon-your-dvcs-and-return-to-sanity/ succinctly points up some of the problems with DVCS versus centralized VCS (like subversion). Much further discussion occurs on the various news aggregator sites.
So I was thinking, could Fossil 2.0 be enhanced in ways to support scaling to the point where it works on really massive projects? The key idea would be to relax the requirement that each client load the entire history of the project. Instead, a clone would only load a limited amount of history (a month, a year, perhaps even just the most recent check-in). This would make cloning much faster and the resulting clone much smaller. Missing content could be downloaded from the server on an as-needed basis. So, for example, if the user does "fossil update trunk:2010-01-01" then the local client would first have to go back to the server to fetch content from 2010. The additional content would be added to the local repository. And so the repository would still grow. But it grows only on an as-needed basis rather than starting out at full size. And in the common case where the developer never needs to look at any content over a few months old, the growth is limited. By downloading the meta-data that is currently computed locally by "rebuild", many operations on older content, such as timelines or search, could be performed even without having the data present. In the "bsd-src.fossil" repository, the content is 78% of the repository file and the meta-data is the other 22%. So a clone that stored only the most recent content together with all metadata might be about 1/4th the size of a full clone. For even greater savings, perhaps the metadata could be time-limited, though not as severely as the content. So perhaps the clone would only initialize to the last month of content and the last five years of metadata. For "wide" repositories (such as bsd-src) that hold many thousands of files in a single check-out, Fossil could be enhanced to allow cloning, checkout, and commit of just a small slice of the entire tree. So, for example, a clone might hold just the bin/ subdirectory of bsd-src containing just 56 files, rather than all 147720 files of a complete check-out. Fossil should be able to do everything it normally does with just this subset, including commit changes, except that on new manifests generated by the commit, the R-card would have to be omitted since the entire tree is necessary to compute the R-card. But the R-card is optional already, controlled by the "repo-cksum" setting, which is turned off in bsd-src, so there would be no loss in functionality. Tickets and wiki in a clone might be similarly limited to (say) the previous 12 months of content, or the most recent change, whichever is larger. With these kinds of changes, it seems like Fossil might be made to scale to arbitrarily massive repositories on the client side. On the server side, the current design would work until the repository grew too big to fit into a single disk file, at which point the server would need to be redesigned to use a client/server database like, PostgreSQL, that can scale to sizes larger than the 140 terabyte limit of SQLite. But that would be a really big repo. 22 years of BSD history fits in 7.2 GB, or 61 GB uncompressed. So it would take a rather larger project to get into the terabyte range. The sync protocol would need to be greatly enhanced to support this functionality. Also, the schema for the meta-data, which currently is an implementation detail, would need to become part of the interface. Exposing the meta-data as interface would have been unthinkable a few years ago, but at this point we have accumulated enough experience about what is needed in the meta-data to perhaps make exposing its design a reasonable alternative. These are just thoughts to elicit comments and discussion. I have several unrelated and much higher-priority tasks to keep me busy at the moment, so this is not something that would happen right away, unless somebody else steps up to do a lot of the implementation work. -- D. Richard Hipp d...@sqlite.org _______________________________________________ fossil-dev mailing list fossil-dev@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/fossil-dev