On Mon, Nov 20, 2017 at 01:33:11PM -0700, Warren Young wrote:
> I see a new wiki article:
> 
>     https://www.fossil-scm.org/index.html/wiki?name=Fossil-NG

There are two central design flaws in Fossil that affect larger
repositories and those are the repos that primarily benefit from
narrow/shallow clones. Properly addressing them is kind of a requirement
for either.

(1) The need to parse all artifacts on clone. Artificates should be
strongly typed, i.e. the system should at the very least distinguish
fully between "content" blobs and "meta data" blobs. Only the latter
have and should be parsed. This has a number of important implications,
but the easiest is that the number of artificates a rebuild or even just
a sync has to look at goes down by a factor of 2 at the very least. For
something like NetBSD src or pkgsrc, more like a factor of 10 (number of
blobs in total / number of commits, i.e. the average commit touches 10
files).

(2) Store true differential manifests. The current base line approach is
a somewhat crude approximation. It has the advantage that only two
manifests have to be parsed, but it makes the average manifest size much
larger for larger file trees. The same benefit could be obtained by
caching the file list, either every so often like the current base line
or on-demand. The difference is that the cached manifests are not
persistent meta data and don't have to be transfered.

I would also add a point (3) which is kind of related to (1):

(3) Make cluster manifests non-permanent artifacts. They can also
consume a good amount of space and their purpose could be served by a
Merkle tree as well. This is even more important when doing single
branch sync.

Joerg
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to