On Mon, Oct 31, 2016 at 3:38 PM, Brandon Williams <[email protected]> wrote:
>
> +--recurse-submodules::
> + Recursively search in each submodule that has been initialized and
> + checked out in the repository.
> +
and warn otherwise.
> +
> + /*
> + * Limit number of threads for child process to use.
> + * This is to prevent potential fork-bomb behavior of git-grep as each
> + * submodule process has its own thread pool.
> + */
> + if (num_threads)
> + argv_array_pushf(&submodule_options, "--threads=%d",
> + (num_threads + 1) / 2);
Just like in the run_parallel machinery this seems like an approximate
workaround. I'm ok with that for now.
Ideally the parent/child can send each other signals to hand
over threads. (SIGUSR1/SIGUSR2 would be enough to do that,
though I wonder if that is as portable as I would hope. Or we'd look at
"make" and see how they handle recursive calls.
> +
> + /*
> + * Capture output to output buffer and check the return code from the
> + * child process. A '0' indicates a hit, a '1' indicates no hit and
> + * anything else is an error.
> + */
> + status = capture_command(&cp, &w->out, 0);
> + if (status && (status != 1))
Does the user have enough information what went wrong?
Is the child verbose enough, such that we do not need to give a
die[_errno]("submodule processs failed") ?
> +static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
> + const char *filename, const char *path)
> +{
> + if (!(is_submodule_initialized(path) &&
If it is not initialized, the user "obviously" doesn't care, so maybe
we only need to warn
if init, but not checked out?
> + is_submodule_checked_out(path))) {
> + warning("skiping submodule '%s%s' since it is not initialized
> and checked out",
> + super_prefix ? super_prefix : "",
> + path);
> + return 0;
> + }
> +
> +#ifndef NO_PTHREADS
> + if (num_threads) {
> + add_work(opt, GREP_SOURCE_SUBMODULE, filename, path, sha1);
> + return 0;
> + } else
> +#endif
> + {
> + struct work_item w;
> + int hit;
> +
> + grep_source_init(&w.source, GREP_SOURCE_SUBMODULE,
> + filename, path, sha1);
> + strbuf_init(&w.out, 0);
> + opt->output_priv = &w;
> + hit = grep_submodule_launch(opt, &w.source);
> +
> + write_or_die(1, w.out.buf, w.out.len);
> +
> + grep_source_clear(&w.source);
> + strbuf_release(&w.out);
> + return hit;
> + }
> +}
> +
> +static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
> + int cached)
> {
> int hit = 0;
> int nr;
> + struct strbuf name = STRBUF_INIT;
> + int name_base_len = 0;
> + if (super_prefix) {
> + name_base_len = strlen(super_prefix);
> + strbuf_addstr(&name, super_prefix);
> + }
> +
> read_cache();
>
> for (nr = 0; nr < active_nr; nr++) {
> const struct cache_entry *ce = active_cache[nr];
> - if (!S_ISREG(ce->ce_mode))
> - continue;
> - if (!ce_path_match(ce, pathspec, NULL))
> - continue;
> - /*
> - * If CE_VALID is on, we assume worktree file and its cache
> entry
> - * are identical, even if worktree file has been modified, so
> use
> - * cache version instead
> - */
> - if (cached || (ce->ce_flags & CE_VALID) ||
> ce_skip_worktree(ce)) {
> - if (ce_stage(ce) || ce_intent_to_add(ce))
> - continue;
> - hit |= grep_sha1(opt, ce->oid.hash, ce->name, 0,
> - ce->name);
> + strbuf_setlen(&name, name_base_len);
> + strbuf_addstr(&name, ce->name);
> +
> + if (S_ISREG(ce->ce_mode) &&
> + match_pathspec(pathspec, name.buf, name.len, 0, NULL,
> + S_ISDIR(ce->ce_mode) ||
> + S_ISGITLINK(ce->ce_mode))) {
Why do we have to pass the ISDIR and ISGITLINK here for the regular file
case? ce_path_match and match_pathspec are doing the same thing?
> + /*
> + * If CE_VALID is on, we assume worktree file and its
> + * cache entry are identical, even if worktree file
> has
> + * been modified, so use cache version instead
> + */
> + if (cached || (ce->ce_flags & CE_VALID) ||
> + ce_skip_worktree(ce)) {
> + if (ce_stage(ce) || ce_intent_to_add(ce))
> + continue;
> + hit |= grep_sha1(opt, ce->oid.hash, ce->name,
> + 0, ce->name);
> + } else {
> + hit |= grep_file(opt, ce->name);
> + }
> + } else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> + submodule_path_match(pathspec, name.buf, NULL)) {
> + hit |= grep_submodule(opt, NULL, ce->name, ce->name);
What is the difference between the last two parameters?
> + * filename: name of the submodule including tree name of parent
> + * path: location of the submodule
That sounds the same to me.
> }
>
> + if (recurse_submodules && (!use_index || untracked || list.nr))
> + die(_("option not supported with --recurse-submodules."));
The user asks: Which option?
> +
> +test_expect_success 'grep and nested submodules' '
> + git init submodule/sub &&
> + echo "foobar" >submodule/sub/a &&
> + git -C submodule/sub add a &&
> + git -C submodule/sub commit -m "add a" &&
> + git -C submodule submodule add ./sub &&
> + git -C submodule add sub &&
> + git -C submodule commit -m "added sub" &&
> + git add submodule &&
> + git commit -m "updated submodule" &&
Both in this test as well as in the setup, we setup a repository
with submodules, that have clean working dirs.
What should happen with dirty working dirs. dirty in the sense:
* file untracked in the submodule
* file added in the submodule, but not committed
* file committed in the submodule, that commit is
untracked in the superproject
* file committed in the submodule, that commit is
added to the index in the superproject
* (last case is just as above:) file committed in submodule,
that commit was committed into the superproject.