While waiting for boarding at the Atlanta airport, I finally fleshed out
a plan about how to begin populating and using our private Savannah
Hurd-specific glibc repository / fork / whatever you name it.  Can all of
Roland, Samuel, other interested parties subscribe to that scheme?

Some abbreviations:

  * R(libc) -- Savannah Hurd-specific glibc repository,

  * D(libc) -- Debian libc package; this is these days based on eglibc,
    not glibc

  * P(D(x)) -- patches applied to D(x), compared to (upstream) x

  * P(y,D(x)) -- y-specific patches of P(D(x))

Objective, problems, some solutions:

  * R(libc) is to be based on glibc, not eglibc.  eglibc shouldn't
    contain a lot of differences in the Hurd-specific parts, but yet,
    there may be differences.

  * R(libc) has to be usable for my own libc works, as well as other
    peoples' similar needs of course:

    (a) provide a stable basis for building cross toolchains,

    (b) general libc development.

  * R(libc) needs to automatically generate the P(hurd,D(libc)) files.
    There is no reason we should continue to maintain P(hurd,D(libc)) in
    parallel, manually.  Preferrably, one patch file per topic should be
    generated, not a huge combined one.

    Usually, these generated patches should be equivalent to the (a)
    subset of R(libc), but:

      * There will be patches in R(libc) that are not wanted for
        P(hurd,D(libc)), because they're already being handled outside of
        P(hurd,D(libc)) in P(D(libc)), as they're needed for other
        non-Linux architectures (k*BSD), too, for example.  There must be
        a way to exclude these.

      * There will be patches whose content is indeed wanted for
        P(hurd,D(libc)), but needs to be (slightly) frobbed before being
        usable in Debian's eglibc context.  That can be taken care of by
        additional patches in P(hurd,D(libc)); these are not derived from
        R(libc), and are either prepended or appended (as appropriate) to
        the series derived from R(libc).

      * There will be patches that are really only relevant to Debian.
        We can either still keep them in R(libc) too, or these stay
        manually maintained in P(hurd,D(libc)) as they're now.

  * The versions of the upstream libc package / the version R(libc) is
    based on may be slightly different from what D(libc) is using.  This
    shouldn't pose major problems, though, and can be handled as above,

  * All reasonable stuff from R(libc) should eventually be merged into
    the upstream glibc repository.

    Instead of staging stuff in R(libc), we could submit stuff upstream
    right from the beginning, but there are problems:

      * Upstream essentially only wants perfect patches.  Not everything
        we have at the moment is ready for prime time.  (Yet, unfinished
        patches are needed / used to base further work upon.)

      * Often, there is a huge delay between submission and acceptance
        (if at all).

  * One central repository for handling all stuff involving R(libc) is
    not sufficient, but a distributed one is needed.  This needs to
    include all meta information (dependencies between different topic
    branches, for example).

  * Upstream glibc / eglibc repositories are not at all in a usable shape
    at the moment.  Thus: simple topic branches which will eventually be
    merged into master are not possible; again, dependencies between
    topic branches.

  * Yet, for the usual reasons, maintaining stuff in topic branches is
    preferrable to a linearized structure (simply committing stuff in the
    appropriate order in one Git branch; which would inherently resolve
    all dependencies).  There are like 30 topic branches at the moment.

Possible solutions, and their problems:

  * One Git branch: not wanted, as above.

  * Many Git (topic) branches: possible, but problem of dependencies
    between topic branches.  Possibly difficult to generated usable
    P(hurd,D(libc)) patches from that structure.

  * Quilt, <http://savannah.nongnu.org/projects/quilt>,
    <http://www.suse.de/~agruen/quilt.pdf>.  Had a brief look at it.
    Instead of topic branches, has several (topic) patch files, which are
    automatically applied in the appropritate ordering.  This is what
    D(libc) (and a lot of other packages) are doing right now.  Problem:
    unwieldy for general development; doesn't integrate with Git, thus
    not easily distributable.

  * guilt (quilt on top of git),
    Had a brief look at it.  Like Quilt; but does in a way integrate with
    Git, but not really in a distributed fashion, as I understand it.
    Might be feasible to be used, but so far didn't look at it in too
    much detail.

  * StGit (stacked Git), <http://www.procode.org/stgit/>.  Had a brief
    look at it.  Mostly like guilt, it seems.  What's the difference
    between these two, exactly?

  * TopGit, <http://repo.or.cz/w/topgit.git>.  Had a more-than-brief look
    at it.  Like guilt and StGit, but patches' meta-data (dependencies)
    is a first-class Git citizen.  This seems to be exactly what we want
    to use.  <http://repo.or.cz/w/topgit.git?a=blob;f=README> begins like
    this, but the whole file is well worth reading:

    | TopGit - A different patch queue manager
    | -----------
    | TopGit aims to make handling of large amount of interdependent topic
    | branches easier. In fact, it is designed especially for the case
    | when you maintain a queue of third-party patches on top of another
    | (perhaps Git-controlled) project and want to easily organize, maintain
    | and submit them - TopGit achieves that by keeping a separate topic
    | branch for each patch and providing few tools to maintain the branches.
    | ---------
    | Why not use something like StGIT or Guilt or rebase -i for maintaining
    | your patch queue?  The advantage of these tools is their simplicity;
    | they work with patch _series_ and defer to the reflog facility for
    | version control of patches (reordering of patches is not
    | version-controlled at all). But there are several disadvantages -
    | for one, these tools (especially StGIT) do not actually fit well
    | with plain Git at all: it is basically impossible to take advantage
    | of the index effectively when using StGIT. But more importantly,
    | these tools horribly fail in the face of distributed environment.
    | TopGit has been designed around three main tenets:
    |   (i) TopGit is as thin layer on top of Git as possible.
    | You still maintain your index and commit using Git, TopGit will
    | only automate few indispensable tasks.
    |   (ii) TopGit is anxious about _keeping_ your history. It will
    | never rewrite your history and all metadata is also tracked by Git,
    | smoothly and non-obnoxiously. It is good to have a _single_ point
    | when the history is cleaned up, and that is at the point of inclusion
    | in the upstream project; locally, you can see how your patch has evolved
    | and easily return to older versions.
    |   (iii) TopGit is specifically designed to work in distributed
    | environment. You can have several instances of TopGit-aware repositories
    | and smoothly keep them all up-to-date and transfer your changes between
    | them.
    | As mentioned above, the main intended use-case for TopGit is tracking
    | third-party patches, where each patch is effectively a single topic
    | branch.  In order to flexibly accommodate even complex scenarios when
    | you track many patches where many are independent but some depend
    | on others, TopGit ignores the ancient Quilt heritage of patch series
    | and instead allows the patches to freely form graphs (DAGs just like
    | Git history itself, only "one level higher"). For now, you have
    | to manually specify which patches does the current one depend
    | on, but TopGit might help you with that in the future in a darcs-like
    | fashion.
    | [...]

Any comments before I have a go at implementing this scheme?


Attachment: pgpwWcHnWTly3.pgp
Description: PGP signature

Reply via email to