One question that arises is: how do I define what a "server" is? Can I get the complete repository history for everything else but get a more limited history for files that are larger than a certain size, or that have certain extensions?
How would this work with sub-repositories (sorry, not versed very well in fossil, but I understand that there can be sub respositories that are nested under the main one (for instance for a directory which contains a lot of videos or images)) Thanks. Richard On 3/2/15, Richard Hipp <d...@sqlite.org> wrote: > Ben Pollack's essay at > http://bitquabit.com/post/unorthodocs-abandon-your-dvcs-and-return-to-sanity/ > succinctly points up some of the problems with DVCS versus centralized > VCS (like subversion). Much further discussion occurs on the various > news aggregator sites. > > So I was thinking, could Fossil 2.0 be enhanced in ways to support > scaling to the point where it works on really massive projects? > > The key idea would be to relax the requirement that each client load > the entire history of the project. Instead, a clone would only load a > limited amount of history (a month, a year, perhaps even just the most > recent check-in). This would make cloning much faster and the > resulting clone much smaller. Missing content could be downloaded > from the server on an as-needed basis. So, for example, if the user > does "fossil update trunk:2010-01-01" then the local client would > first have to go back to the server to fetch content from 2010. The > additional content would be added to the local repository. And so the > repository would still grow. But it grows only on an as-needed basis > rather than starting out at full size. And in the common case where > the developer never needs to look at any content over a few months > old, the growth is limited. > > By downloading the meta-data that is currently computed locally by > "rebuild", many operations on older content, such as timelines or > search, could be performed even without having the data present. In > the "bsd-src.fossil" repository, the content is 78% of the repository > file and the meta-data is the other 22%. So a clone that stored only > the most recent content together with all metadata might be about > 1/4th the size of a full clone. For even greater savings, perhaps the > metadata could be time-limited, though not as severely as the content. > So perhaps the clone would only initialize to the last month of > content and the last five years of metadata. > > For "wide" repositories (such as bsd-src) that hold many thousands of > files in a single check-out, Fossil could be enhanced to allow > cloning, checkout, and commit of just a small slice of the entire > tree. So, for example, a clone might hold just the bin/ subdirectory > of bsd-src containing just 56 files, rather than all 147720 files of a > complete check-out. Fossil should be able to do everything it > normally does with just this subset, including commit changes, except > that on new manifests generated by the commit, the R-card would have > to be omitted since the entire tree is necessary to compute the > R-card. But the R-card is optional already, controlled by the > "repo-cksum" setting, which is turned off in bsd-src, so there would > be no loss in functionality. > > Tickets and wiki in a clone might be similarly limited to (say) the > previous 12 months of content, or the most recent change, whichever is > larger. > > With these kinds of changes, it seems like Fossil might be made to > scale to arbitrarily massive repositories on the client side. On the > server side, the current design would work until the repository grew > too big to fit into a single disk file, at which point the server > would need to be redesigned to use a client/server database like, > PostgreSQL, that can scale to sizes larger than the 140 terabyte limit > of SQLite. But that would be a really big repo. 22 years of BSD > history fits in 7.2 GB, or 61 GB uncompressed. So it would take a > rather larger project to get into the terabyte range. > > The sync protocol would need to be greatly enhanced to support this > functionality. Also, the schema for the meta-data, which currently is > an implementation detail, would need to become part of the interface. > Exposing the meta-data as interface would have been unthinkable a few > years ago, but at this point we have accumulated enough experience > about what is needed in the meta-data to perhaps make exposing its > design a reasonable alternative. > > These are just thoughts to elicit comments and discussion. I have > several unrelated and much higher-priority tasks to keep me busy at > the moment, so this is not something that would happen right away, > unless somebody else steps up to do a lot of the implementation work. > > -- > D. Richard Hipp > d...@sqlite.org > _______________________________________________ > fossil-users mailing list > fossil-users@lists.fossil-scm.org > http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users > -- Thank you. Richard Boehme Email: rboe...@gmail.com Phone: 443-739-8502 Work Phone: 410-966-6606 (Mon - Thu 6 AM - 4:30 PM) _______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users