My latest thoughts on Fossil-2.0 are in the attachment.  (I think I
have successfully modified the mailing list settings to allow text
attachments through - this message will serve as a test case.)
-- 
D. Richard Hipp
d...@sqlite.org
The latest thinking on the Fossil-2.0 upgrade.

(1) Keep the BLOB.UUID column.  The value in BLOB.UUID is the display name
    for an artifact.  Most artifacts have only this one name.  For older
    artifacts the name will be a SHA1 hash.  For newer artifacts the display
    name will be the SHA3-256 hash (or some other hash).

(2) Add a new ALIAS table as follows:

       CREATE TABLE alias(
         hval TEXT,                      -- Hex-encoded hash value
         htype ANY,                      -- hash type
         rid INTEGER REFERENCES blob,    -- Blob that this hash names
         PRIMARY KEY(hval,htype,id)
      ) WITHOUT ROWID;
      CREATE INDEX alias_rid ON alias(rid);

    This alias table will hold alternative names for artifacts.  If the
    display name is SHA3-256, there might be a SHA1 alias.  Fossil will
    work to keep the number of aliases to a minimum.  Most artifacts will
    have only a display name and no aliases.  Many repositories will have
    no aliases at all.

    Once aliases are registered for an artifact, the artifact can be referred
    to using either its display name or any of its aliases.

(3) The repository keeps a list of all hash algorithms used.  For new
    respositories, this list will be a singleton: SHA3-256.  For legacy
    respositories, the list will be of length two:  SHA3-256, SHA1.  The
    first algorithm on the list is the preferred algorithm and is the hash
    used for new artifacts added by a "fossil commit".

(4) As each new artifacts is added by "fossil commit", all possible hash
    names must be computed, in order to check to see if that artifact is
    already in the repository.  If the new artifact is already in the
    repository, it takes on the display name of the preexisting artifact.
    If the artifact has never been seen before, the preferred hash algorithm
    is used for the display name and the other hashes are discarded.

(5) During synchronization, if one side knows the artifact only by its SHA1
    name and the other size knows the artifact only by its SHA3-256 name, then
    the two sides will not know that they are holding the same artifact.
    The artifact content will be sent over the wire unnecessarily. But once
    that happens, both sides will register aliases and no further unnecessary
    syncing will occur.  It is expected that this unnecessary syncing will
    be very rare.

(6) Check-in manifests and other structural artifacts are allowed to contain
    a mixture of hash types.  A check-in that occurs after transitioning a
    project from SHA1 to SHA3-256 will identify older files using their SHA1
    hashes and will identify files that have changed since the transition by
    their SHA3-256 hashes.

(7) If a Fossil-2.0 repository contains only SHA1 display names, then it will
    sync with an older Fossil-1.x peer.  However, the Fossil-1.x peer will
    complain about protocol errors if artifacts with display names other
    than SHA1 are used.

(8) There are no changes to the sync protocol, other than relaxing the
    constraint on hash length.  For fossil-1.x, the hash length must be
    exactly 40 characters.  For Fossil-2.0, the hash length must be 40
    characters or more.

(9) There are no changes to the file formats, other than relaxing the size
    constraint on artifact hashes - allowing hash to be greater than or
    equal to 40 characters rather than requiring it to be exactly 40
    characters.

(10) Probably:  If the display name for an artifact is shorter than an
    alias name, then the display name and alias name will swap places.
    In this way, if the same artifact is referenced by both its SHA3-256
    name and its SHA1 name, then SHA3-256 name will automatically become
    the display name.

(11) URLs and command-line arguments can use either the display name or any
     of the aliases for an artifact.

(12) Web pages that show details about an artifact will be titled by the
     display name, but will also show all aliases.

(13) Repositories will have the option to reject newer content that uses
     SHA1 hash names.
_______________________________________________
fossil-dev mailing list
fossil-dev@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/fossil-dev

Reply via email to