I've sprinkled a bunch of extra comments around in various places.

Diff comments:

> 
> === modified file 'lib/lp/codehosting/codeimport/worker.py'
> --- lib/lp/codehosting/codeimport/worker.py   2016-10-11 15:36:32 +0000
> +++ lib/lp/codehosting/codeimport/worker.py   2016-10-11 15:36:32 +0000
> @@ -918,3 +973,67 @@
>          """See `PullingImportWorker.probers`."""
>          from bzrlib.bzrdir import BzrProber, RemoteBzrProber
>          return [BzrProber, RemoteBzrProber]
> +
> +
> +class GitToGitImportWorker(ImportWorker):
> +    """An import worker for imports from Git to Git."""
> +
> +    def _runGit(self, *args, **kwargs):
> +        """Run git with arguments, sending output to the logger."""
> +        cmd = ["git"] + list(args)
> +        git_process = subprocess.Popen(
> +            cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, **kwargs)
> +        for line in git_process.stdout:
> +            line = line.decode("UTF-8", "replace").rstrip("\n")
> +            # Remove any user/password data from URLs.
> +            line = re.sub(r"://([^:]*:[^@]*@)(\S+)", r"://\2", line)
> +            self._logger.info(line)
> +        retcode = git_process.wait()
> +        if retcode:
> +            raise subprocess.CalledProcessError(retcode, cmd)
> +
> +    def _doImport(self):
> +        self._logger.info("Starting job.")
> +        self._logger.info(config.codeimport.git_repository_store)
> +        try:
> +            self._opener_policy.checkOneURL(self.source_details.url)

As far as I can see from http.c, git follows up to 20 redirects as long as 
they're to http/https/ftp/ftps schemes, and there's no way to configure this.

The only things that this would let through that checkOneURL tries to forbid 
would be redirects to git.launchpad.net and redirects to a blacklisted 
hostname, which in practice is always localhost or 127.0.0.1.  Perhaps we can 
get away with this state for the time being?  Redirects to git.launchpad.net 
would basically be a slightly annoying waste of resources but nothing more, 
while redirects to localhost should in practice be harmless given that code 
import workers don't run web or FTP servers.

The only way I can think of to improve isolation here would be to direct all 
git requests through a proxy, which would be a non-trivial chunk of work.

> +        except BadUrl as e:
> +            self._logger.info("Invalid URL: %s" % e)
> +            return CodeImportWorkerExitCode.FAILURE_FORBIDDEN
> +        unauth_target_url = urljoin(
> +            config.codeimport.git_repository_store,
> +            self.source_details.target_id)
> +        split = urlsplit(unauth_target_url)
> +        if split.hostname:
> +            target_netloc = ":%s@%s" % (
> +                self.source_details.macaroon.serialize(), split.hostname)
> +        else:
> +            target_netloc = ""
> +        target_url = urlunsplit([
> +            split.scheme, target_netloc, split.path, "", ""])
> +        # XXX cjwatson 2016-10-11: Ideally we'd put credentials in a
> +        # credentials store instead.  However, git only accepts credentials
> +        # that have both a non-empty username and a non-empty password.
> +        self._logger.info("Getting existing repository from hosting 
> service.")
> +        try:
> +            self._runGit("clone", "--bare", target_url, "repository")
> +        except subprocess.CalledProcessError as e:
> +            self._logger.info(
> +                "Unable to get existing repository from hosting service: "
> +                "%s" % e)
> +            return CodeImportWorkerExitCode.FAILURE
> +        self._logger.info("Fetching remote repository.")
> +        try:
> +            self._runGit(
> +                "remote", "add", "-f", "--mirror=fetch",
> +                "mirror", self.source_details.url, cwd="repository")

More or less, with the addition of --prune.  Though good grief HEAD handling is 
horrible, but I've just added a comment about that for now.

> +        except subprocess.CalledProcessError as e:
> +            self._logger.info("Unable to fetch remote repository: %s" % e)
> +            return CodeImportWorkerExitCode.FAILURE_INVALID
> +        self._logger.info("Pushing repository to hosting service.")
> +        try:
> +            self._runGit("push", "--mirror", target_url, cwd="repository")
> +        except subprocess.CalledProcessError as e:
> +            self._logger.info("Unable to push to hosting service: %s" % e)
> +            return CodeImportWorkerExitCode.FAILURE
> +        return CodeImportWorkerExitCode.SUCCESS
> 
> === modified file 'lib/lp/services/config/schema-lazr.conf'
> --- lib/lp/services/config/schema-lazr.conf   2016-10-11 15:36:32 +0000
> +++ lib/lp/services/config/schema-lazr.conf   2016-10-11 15:36:32 +0000
> @@ -417,7 +421,7 @@
>  svn_revisions_import_limit: 500
>  
>  # Secret key for macaroons used to grant git push permission to workers.
> -macaroon_secret_key:
> +macaroon_secret_key: none

ImplicitTypeSection._convert does:

    elif match.group('none'):
        return None

So yeah.  It works out the same either way since 
CodeImportJobMacaroonIssuer._root_secret just does "if not secret:", but I 
think None is a tidier default than the empty string.

>  
>  [codeimportdispatcher]
>  # The directory where the code import worker should be directed to


-- 
https://code.launchpad.net/~cjwatson/launchpad/codeimport-git-worker/+merge/308142
Your team Launchpad code reviewers is subscribed to branch lp:launchpad.

_______________________________________________
Mailing list: https://launchpad.net/~launchpad-reviewers
Post to     : launchpad-reviewers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-reviewers
More help   : https://help.launchpad.net/ListHelp

Reply via email to