Hello, all.

After a few days of thinking, discovering and working, here it is.
The first working draft of new git eclass codenamed 'git-r3'.

First of all, the name is not final. I'm open to ideas. I'm open to
naming it 'git-r1' to put it in line with my other -r1 eclasses :).
I'd definitely like to avoid 'git-3' though, since that version-like
naming was a mistake as almost-'python-2' eclass shown.

Secondly, it's not even final that there will be a new eclass. Most
likely I will commit it as a new eclass since that way is easier for us
but if you prefer I may try to get it and git-2 more API-friendly
and work on making it a almost-drop-in replacement. Since, after all,
internals have actually changed much more than the API.


And now for the major changes:

1. The code has been split into clean 'fetch' and 'checkout' pieces.

That is, it is suited for distinct src_fetch() and src_unpack() phases
that we'll hopefully have in EAPI 6. What's important, the checkout
code does not rely on passing *any* environment variables from fetching
code. It is also made with concurrency in mind, so multiple ebuilds
using the same repository at the same time shouldn't be a problem.

2. Public fetch/checkout API.

git-2 has a lot of private functions and just src_unpack(). git-r3 has
git-r3_fetch() and git-r3_checkout() which are public API intended to
used in ebuilds that need more than plain fetch+unpack. While this
isn't exactly what multi-repo support pursuers wanted, it should make
supporting multiple repos in one ebuild much cleaner.

3. Clean submodule support with bare clones.

Since the submodules are very straightforward in design, I have decided
to move their support into the eclass directly. As a result, the new
eclass cleanly supports submodules, treating them as additional
repositories and doing submodule fetch/checkout recursively. There is
no need for non-bare clones anymore (and therefore their support has
been removed to make code simpler), and submodules work fine with
EVCS_OFFLINE=1.

4. 'Best-effort' shallow clones support.

I did my best to support shallow clones in the eclass. The code is
specifically designed to handle them whenever possible. However, since
shallow clones have a few limitations:

a) only branch/tag-based fetches support shallow clones. Fetching by
commit id forces complete clone (this is what submodules do BTW).

b) there's EGIT_NONSHALLOW option for users who prefer to have full
clones, and possibly for ebuilds that fail with shallow clones.

c) if shallow clones cause even more trouble than that, I will simply
remove their support from the eclass :).

[see notes about testing at the end]

5. Safer default EGIT_DIR choice. EGIT_PROJECT removed.

Since submodules are cloned as separate repositories as well, we can't
afford having EGIT_PROJECT to change the clone dir. Instead, the eclass
uses full path from first repo URI (with some preprocessing) to
determine the clone location. This should ensure non-colliding clones
with most likeliness that two ebuilds using the same repo will use
the same clone without any special effort from the maintainer.

6. Safer default checkout dir. EGIT_SOURCEDIR removed.

git-2 used to default EGIT_SOURCEDIR=${S}. This kinda sucked since if
one wanted to use subdirectory of the git repo, he needed to both set
EGIT_SOURCEDIR and S. Now, the checkout is done to ${WORKDIR}/${P}
by default and ebuilds can safely do S=${WORKDIR}/${P}/foo. I may
provide EGIT_SOURCEDIR if someone still finds it useful.


API/variables removed:

1. EGIT_SOURCEDIR:

a) if you need it for multiple repos, use the fetch/checkout functions
instead,

b) otherwise, play with S instead,

c) if you really need it, lemme know and I'll put it back.

2. EGIT_HAS_SUBMODULES -> no longer necessary, we autodetect them
(and we don't need that much special magic like we used to).

3. EGIT_OPTIONS -> interfered too much with eclass internals.

4. EGIT_MASTER -> people misused it all the time, and it caused issues
for projects that used different default branch. Now we just respect
upstream's default branch.

5. EGIT_PROJECT -> should be no longer necessary.

6. EGIT_DIR -> still exported, but no longer respects user setting it.

7. EGIT_REPACK, EGIT_PRUNE -> I will probably reintroduce it, or just
provide the ability to set git auto-cleanup options.

8. EGIT_NONBARE -> only bare clones are supported now.

9. EGIT_NOUNPACK -> git-2 is only eclass calling the default. Does
anyone actually need this? Is adding custom src_unpack() that hard?

10. EGIT_BOOTSTRAP -> this really belongs in *your* src_prepare().


I've tested the eclass on 113 live packages I'm using. Most of them
work just fine (I've replaced git-2 with the new eclass). Some
of them only require removing the old variables, some need having S
changed. However, I noticed the following issues as well:

1. code.google fails with 500 when trying to do a shallow clone
(probably they implemented their own dumb git server),

2. sys-apps/portage wants to play with 'git log' for ChangeLogs. That's
something that definitely is not going to work with shallow clones ;).
Not that I understand why someone would like a few megs of detailed git
backlog.

3. sys-fs/bedup's btrfs-progs submodule says the given commit id is
'not a valid branch point'. Need to investigate what this means.

4. 'git fetch --depth 1' seems to be refetching stuff even when nothing
changed. Need to investigate it. It may be enough to do an additional
'did anything change?' check.


I will try to look into those issues tomorrow. In the meantime, please
review this eclass and give me your thoughts. Especially if someone has
some more insight on shallow clones. Thanks.

And a fun fact: LLVM subversion checkout (in svn-src) has around ~2.4k
files which consume around 220M on btrfs. LLVM git shallow clone takes
17M.

-- 
Best regards,
Michał Górny
# Copyright 1999-2013 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: $

# @ECLASS: git-r3.eclass
# @MAINTAINER:
# Michał Górny <mgo...@gentoo.org>
# @BLURB: Eclass for fetching and unpacking git repositories.
# @DESCRIPTION:
# Third generation eclass for easing maitenance of live ebuilds using
# git as remote repository. Eclass supports lightweight (shallow)
# clones, local object deduplication and submodules.

case "${EAPI:-0}" in
        0|1|2|3|4|5)
                ;;
        *)
                die "Unsupported EAPI=${EAPI} (unknown) for ${ECLASS}"
                ;;
esac

if [[ ! ${_GIT_R3} ]]; then

inherit eutils

fi

EXPORT_FUNCTIONS src_unpack

if [[ ! ${_GIT_R3} ]]; then

# @ECLASS-VARIABLE: EGIT_STORE_DIR
# @DESCRIPTION:
# Storage directory for git sources.
#
# EGIT_STORE_DIR=${DISTDIR}/git3-src

# @ECLASS-VARIABLE: EGIT_REPO_URI
# @REQUIRED
# @DESCRIPTION:
# URIs to the repository, e.g. git://foo, https://foo. If multiple URIs
# are provided, the eclass will consider them as fallback URIs to try
# if the first URI does not work.
#
# It can be overriden via env using ${PN}_LIVE_REPO variable.
#
# Example:
# @CODE
# EGIT_REPO_URI="git://a/b.git https://c/d.git";
# @CODE

# @ECLASS-VARIABLE: EVCS_OFFLINE
# @DEFAULT_UNSET
# @DESCRIPTION:
# If non-empty, this variable prevents any online operations.

# @ECLASS-VARIABLE: EGIT_BRANCH
# @DEFAULT_UNSET
# @DESCRIPTION:
# The branch name to check out. If unset, the upstream default (HEAD)
# will be used.
#
# It can be overriden via env using ${PN}_LIVE_BRANCH variable.

# @ECLASS-VARIABLE: EGIT_COMMIT
# @DEFAULT_UNSET
# @DESCRIPTION:
# The tag name or commit identifier to check out. If unset, newest
# commit from the branch will be used. If set, EGIT_BRANCH will
# be ignored.
#
# It can be overriden via env using ${PN}_LIVE_COMMIT variable.

# @ECLASS-VARIABLE: EGIT_NONSHALLOW
# @DEFAULT_UNSET
# @DESCRIPTION:
# Disable performing shallow fetches/clones. Shallow clones have
# a fair number of limitations. Therefore, if you'd like the eclass to
# perform complete clones instead, set this to a non-null value.
#
# This variable is to be set in make.conf. Ebuilds are not allowed
# to set it.

# @FUNCTION: _git-r3_env_setup
# @INTERNAL
# @DESCRIPTION:
# Set the eclass variables as necessary for operation. This can involve
# setting EGIT_* to defaults or ${PN}_LIVE_* variables.
_git-r3_env_setup() {
        debug-print-function ${FUNCNAME} "$@"

        local esc_pn livevar
        esc_pn=${PN//[-+]/_}

        livevar=${esc_pn}_LIVE_REPO
        EGIT_REPO_URI=${!livevar:-${EGIT_REPO_URI}}
        [[ ${EGIT_REPO_URI} ]] \
                || die "EGIT_REPO_URI must be set to a non-empty value"
        [[ ${!livevar} ]] \
                && ewarn "Using ${livevar}, no support will be provided"

        livevar=${esc_pn}_LIVE_BRANCH
        EGIT_BRANCH=${!livevar:-${EGIT_BRANCH}}
        [[ ${!livevar} ]] \
                && ewarn "Using ${livevar}, no support will be provided"

        livevar=${esc_pn}_LIVE_COMMIT
        EGIT_COMMIT=${!livevar:-${EGIT_COMMIT}}
        [[ ${!livevar} ]] \
                && ewarn "Using ${livevar}, no support will be provided"

        # git-2 unsupported cruft
        local v
        for v in EGIT_{SOURCEDIR,MASTER,HAS_SUBMODULES,PROJECT} \
                        EGIT_{NOUNPACK,BOOTSTRAP}
        do
                [[ ${!v} ]] && die "${v} is not supported."
        done
}

# @FUNCTION: _git-r3_set_gitdir
# @USAGE: <repo-uri>
# @INTERNAL
# @DESCRIPTION:
# Obtain the local repository path and set it as GIT_DIR. Creates
# a new repository if necessary.
#
# <repo-uri> may be used to compose the path. It should therefore be
# a canonical URI to the repository.
_git-r3_set_gitdir() {
        debug-print-function ${FUNCNAME} "$@"

        local repo_name=${1#*://*/}

        # strip common prefixes to make paths more likely to match
        # e.g. git://X/Y.git vs https://X/git/Y.git
        # (but just one of the prefixes)
        case "${repo_name}" in
                # cgit can proxy requests to git
                cgit/*) repo_name=${repo_name#cgit/};;
                # pretty common
                git/*) repo_name=${repo_name#git/};;
                # gentoo.org
                gitroot/*) repo_name=${repo_name#gitroot/};;
                # google code, sourceforge
                p/*) repo_name=${repo_name#p/};;
                # kernel.org
                pub/scm/*) repo_name=${repo_name#pub/scm/};;
        esac
        # ensure a .git suffix, same reason
        repo_name=${repo_name%.git}.git
        # now replace all the slashes
        repo_name=${repo_name//\//_}

        local distdir=${PORTAGE_ACTUAL_DISTDIR:-${DISTDIR}}
        : ${EGIT_STORE_DIR:=${distdir}/git3-src}

        GIT_DIR=${EGIT_STORE_DIR}/${repo_name}

        if [[ ! -d ${EGIT_STORE_DIR} ]]; then
                (
                        addwrite /
                        mkdir -m0755 -p "${EGIT_STORE_DIR}"
                ) || die "Unable to create ${EGIT_STORE_DIR}"
        fi

        addwrite "${EGIT_STORE_DIR}"
        if [[ ! -d ${GIT_DIR} ]]; then
                mkdir "${GIT_DIR}" || die
                git init --bare || die
        fi
}

# @FUNCTION: _git-r3_set_submodules
# @USAGE: <file-contents>
# @INTERNAL
# @DESCRIPTION:
# Parse .gitmodules contents passed as <file-contents>
# as in "$(cat .gitmodules)"). Composes a 'submodules' array that
# contains in order (name, URL, path) for each submodule.
_git-r3_set_submodules() {
        debug-print-function ${FUNCNAME} "$@"

        local data=${1}

        # ( name url path ... )
        submodules=()

        local l
        while read l; do
                # submodule.<path>.path=<path>
                # submodule.<path>.url=<url>
                [[ ${l} == submodule.*.url=* ]] || continue

                l=${l#submodule.}
                local subname=${l%%.url=*}

                submodules+=(
                        "${subname}"
                        "$(echo "${data}" | git config -f /dev/fd/0 \
                                submodule."${subname}".url)"
                        "$(echo "${data}" | git config -f /dev/fd/0 \
                                submodule."${subname}".path)"
                )
        done < <(echo "${data}" | git config -f /dev/fd/0 -l)
}

# @FUNCTION: git-r3_fetch
# @USAGE: <repo-uri> <remote-ref> <local-id>
# @DESCRIPTION:
# Fetch new commits to the local clone of repository. <repo-uri> follows
# the syntax of EGIT_REPO_URI and may list multiple (fallback) URIs.
# <remote-ref> specifies the remote ref to fetch (branch, tag
# or commit). <local-id> specifies an identifier that needs to uniquely
# identify the fetch operation in case multiple parallel merges used
# the git repo. <local-id> usually involves using CATEGORY, PN and SLOT.
#
# The fetch operation will only affect the local storage. It will not
# touch the working copy. If the repository contains submodules, they
# will be fetched recursively as well.
git-r3_fetch() {
        debug-print-function ${FUNCNAME} "$@"

        local repos=( ${1} )
        local remote_ref=${2}
        local local_id=${3}
        local local_ref=refs/heads/${local_id}/__main__

        local -x GIT_DIR
        _git-r3_set_gitdir ${repos[0]}

        # try to fetch from the remote
        local r success
        for r in ${repos[@]}; do
                einfo "Fetching ${remote_ref} from ${r} ..."

                # first, try ls-remote to see if ${remote_ref} is a real ref
                # and not a commit id. if it succeeds, we can pass ${remote_ref}
                # to 'fetch'. otherwise, we will just fetch everything

                # split on whitespace
                local ref=(
                        $(git ls-remote "${r}" "${remote_ref}")
                )

                local ref_param=()
                if [[ ${ref[0]} ]]; then
                        [[ ${EGIT_NONSHALLOW} ]] || ref_param+=( --depth 1 )
                        ref_param+=( "${remote_ref}" )
                fi

                # if ${remote_ref} is branch or tag, ${ref[@]} will contain
                # the respective commit id. otherwise, it will be an empty
                # array, so the following won't evaluate to a parameter.
                if git fetch --no-tags "${r}" "${ref_param[@]}"; then
                        if ! git branch -f "${local_id}/__main__" 
"${ref[0]:-${remote_ref}}"
                        then
                                die "Creating tag failed (${remote_ref} 
invalid?)"
                        fi
                        success=1
                        break
                fi
        done
        [[ ${success} ]] || die "Unable to fetch from any of EGIT_REPO_URI"

        # recursively fetch submodules
        if git cat-file -e "${local_ref}":.gitmodules &>/dev/null; then
                local submodules
                _git-r3_set_submodules \
                        "$(git cat-file -p "${local_ref}":.gitmodules || die)"

                while [[ ${submodules[@]} ]]; do
                        local subname=${submodules[0]}
                        local url=${submodules[1]}
                        local path=${submodules[2]}
                        local commit=$(git rev-parse "${local_ref}:${path}")

                        if [[ ! ${commit} ]]; then
                                die "Unable to get commit id for submodule 
${subname}"
                        fi

                        git-r3_fetch "${url}" "${commit}" 
"${local_id}/${subname}"

                        submodules=( "${submodules[@]:3}" ) # shift
                done
        fi
}

# @FUNCTION: git-r3_checkout
# @USAGE: <repo-uri> <local-id> <path>
# @DESCRIPTION:
# Check the previously fetched commit out to <path> (usually
# ${WORKDIR}/${P}). <repo-uri> follows the syntax of EGIT_REPO_URI
# and will be used to re-construct the local storage path. <local-id>
# is the unique identifier used for the fetch operation and will
# be used to obtain the proper commit.
#
# If the repository contains submodules, they will be checked out
# recursively as well.
git-r3_checkout() {
        debug-print-function ${FUNCNAME} "$@"

        local repos=( ${1} )
        local local_id=${2}
        local out_dir=${3}

        local -x GIT_DIR GIT_WORK_TREE
        _git-r3_set_gitdir ${repos[0]}
        GIT_WORK_TREE=${out_dir}

        einfo "Checking out ${repos[0]} to ${out_dir} ..."

        mkdir -p "${GIT_WORK_TREE}"
        git checkout -f "${local_id}"/__main__ || die

        # diff against previous revision (if any)
        local new_commit_id=$(git rev-parse --verify "${local_id}"/__main__)
        local old_commit_id=$(
                git rev-parse --verify "${local_id}"/__old__ 2>/dev/null
        )

        if [[ ! ${old_commit_id} ]]; then
                echo "GIT NEW branch -->"
                echo "   repository:               ${repos[0]}"
                echo "   at the commit:            ${new_commit_id}"
        else
                echo "GIT update -->"
                echo "   repository:               ${repos[0]}"
                # write out message based on the revisions
                if [[ "${old_commit_id}" != "${new_commit_id}" ]]; then
                        echo "   updating from commit:     ${old_commit_id}"
                        echo "   to commit:                ${new_commit_id}"
                else
                        echo "   at the commit:            ${new_commit_id}"
                fi
        fi
        git branch -f "${local_id}"/{__old__,__main__} || die

        # recursively checkout submodules
        if [[ -f ${GIT_WORK_TREE}/.gitmodules ]]; then
                local submodules
                _git-r3_set_submodules \
                        "$(cat "${GIT_WORK_TREE}"/.gitmodules)"

                while [[ ${submodules[@]} ]]; do
                        local subname=${submodules[0]}
                        local url=${submodules[1]}
                        local path=${submodules[2]}

                        git-r3_checkout "${url}" "${local_id}/${subname}" \
                                "${GIT_WORK_TREE}/${path}"

                        submodules=( "${submodules[@]:3}" ) # shift
                done
        fi

        # keep this *after* submodules
        export EGIT_DIR=${GIT_DIR}
        export EGIT_VERSION=${new_commit_id}
}

git-r3_src_fetch() {
        debug-print-function ${FUNCNAME} "$@"

        [[ ${EVCS_OFFLINE} ]] && return

        _git-r3_env_setup
        local branch=${EGIT_BRANCH:+refs/heads/${EGIT_BRANCH}}
        git-r3_fetch "${EGIT_REPO_URI}" \
                "${EGIT_COMMIT:-${branch:-HEAD}}" \
                ${CATEGORY}/${PN}/${SLOT}
}

git-r3_src_unpack() {
        debug-print-function ${FUNCNAME} "$@"

        _git-r3_env_setup
        git-r3_src_fetch
        git-r3_checkout "${EGIT_REPO_URI}" \
                ${CATEGORY}/${PN}/${SLOT} \
                "${WORKDIR}/${P}"
}

_GIT_R3=1
fi

Attachment: signature.asc
Description: PGP signature

Reply via email to