Re: [fossil-users] Fossil 2.1: Scaling

Richard Boehme Mon, 02 Mar 2015 06:37:45 -0800

One question that arises is: how do I define what a "server" is? Can I
get the complete repository history for everything else but get a more
limited history for files that are larger than a certain size, or that
have certain extensions?


How would this work with sub-repositories (sorry, not versed very well
in fossil, but I understand that there can be sub respositories that
are nested under the main one (for instance for a directory which
contains a lot of videos or images))

Thanks.

Richard


On 3/2/15, Richard Hipp <d...@sqlite.org> wrote:
> Ben Pollack's essay at
> http://bitquabit.com/post/unorthodocs-abandon-your-dvcs-and-return-to-sanity/
> succinctly points up some of the problems with DVCS versus centralized
> VCS (like subversion).  Much further discussion occurs on the various
> news aggregator sites.
>
> So I was thinking, could Fossil 2.0 be enhanced in ways to support
> scaling to the point where it works on really massive projects?
>
> The key idea would be to relax the requirement that each client load
> the entire history of the project.  Instead, a clone would only load a
> limited amount of history (a month, a year, perhaps even just the most
> recent check-in).  This would make cloning much faster and the
> resulting clone much smaller.  Missing content could be downloaded
> from the server on an as-needed basis.  So, for example, if the user
> does "fossil update trunk:2010-01-01" then the local client would
> first have to go back to the server to fetch content from 2010.  The
> additional content would be added to the local repository.  And so the
> repository would still grow.  But it grows only on an as-needed basis
> rather than starting out at full size.  And in the common case where
> the developer never needs to look at any content over a few months
> old, the growth is limited.
>
> By downloading the meta-data that is currently computed locally by
> "rebuild", many operations on older content, such as timelines or
> search, could be performed even without having the data present.  In
> the "bsd-src.fossil" repository, the content is 78% of the repository
> file and the meta-data is the other 22%.  So a clone that stored only
> the most recent content together with all metadata might be about
> 1/4th the size of a full clone.  For even greater savings, perhaps the
> metadata could be time-limited, though not as severely as the content.
> So perhaps the clone would only initialize to the last month of
> content and the last five years of metadata.
>
> For "wide" repositories (such as bsd-src) that hold many thousands of
> files in a single check-out, Fossil could be enhanced to allow
> cloning, checkout, and commit of just a small slice of the entire
> tree.  So, for example, a clone might hold just the bin/ subdirectory
> of bsd-src containing just 56 files, rather than all 147720 files of a
> complete check-out.  Fossil should be able to do everything it
> normally does with just this subset, including commit changes, except
> that on new manifests generated by the commit, the R-card would have
> to be omitted since the entire tree is necessary to compute the
> R-card.  But the R-card is optional already, controlled by the
> "repo-cksum" setting, which is turned off in bsd-src, so there would
> be no loss in functionality.
>
> Tickets and wiki in a clone might be similarly limited to (say) the
> previous 12 months of content, or the most recent change, whichever is
> larger.
>
> With these kinds of changes, it seems like Fossil might be made to
> scale to arbitrarily massive repositories on the client side.  On the
> server side, the current design would work until the repository grew
> too big to fit into a single disk file, at which point the server
> would need to be redesigned to use a client/server database like,
> PostgreSQL, that can scale to sizes larger than the 140 terabyte limit
> of SQLite.  But that would be a really big repo.  22 years of BSD
> history fits in 7.2 GB, or 61 GB uncompressed.  So it would take a
> rather larger project to get into the terabyte range.
>
> The sync protocol would need to be greatly enhanced to support this
> functionality.  Also, the schema for the meta-data, which currently is
> an implementation detail, would need to become part of the interface.
> Exposing the meta-data as interface would have been unthinkable a few
> years ago, but at this point we have accumulated enough experience
> about what is needed in the meta-data to perhaps make exposing its
> design a reasonable alternative.
>
> These are just thoughts to elicit comments and discussion.  I have
> several unrelated and much higher-priority tasks to keep me busy at
> the moment, so this is not something that would happen right away,
> unless somebody else steps up to do a lot of the implementation work.
>
> --
> D. Richard Hipp
> d...@sqlite.org
> _______________________________________________
> fossil-users mailing list
> fossil-users@lists.fossil-scm.org
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>


-- 
Thank you.

Richard Boehme

Email: rboe...@gmail.com
Phone: 443-739-8502
Work Phone: 410-966-6606 (Mon - Thu 6 AM - 4:30 PM)
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] Fossil 2.1: Scaling

Reply via email to