On Sun, Aug 5, 2018 at 7:40 PM, Christian Couder
<christian.cou...@gmail.com> wrote:
> On Tue, Jul 24, 2018 at 7:11 PM, Junio C Hamano <gits...@pobox.com> wrote:

> This is the new documentation:
>
> -> Refs are grouped into islands based on their "names", and two regexes
> -> that produce the same name are considered to be in the same
> -> island. The names are computed from the regexes by concatenating any
> -> capture groups from the regex, with a '-' dash in between. (And if
> -> there are no capture groups, then the name is the empty string, as in
> -> the above example.) This allows you to create arbitrary numbers of
> -> islands. Only up to 7 such capture groups are supported though.
>
> I added the last sentence above, but I wonder if it is 7 or 8.

Actually after reading the man page again, it looks like the first
element in the array we pass is used for "the entire regular
expression's match addresses", so we only get 7 capture groups (not
8).

> The code is the following:
>
> -> static int find_island_for_ref(const char *refname, const struct
> object_id *oid,
> ->                    int flags, void *data)
> -> {
> ->     int i, m;
> ->     regmatch_t matches[8];
> ->     struct strbuf island_name = STRBUF_INIT;
> ->
> ->     /* walk backwards to get last-one-wins ordering */
> ->     for (i = island_regexes_nr - 1; i >= 0; i--) {
> ->         if (!regexec(&island_regexes[i], refname,
> ->                  ARRAY_SIZE(matches), matches, 0))
> ->             break;
> ->     }
>
> I also wonder if the above is enough to diagnose end-user input when
> it requires more captures than we support. A quick look at the man
> page of the regex functions wasn't enough to enlighten me. Any input
> on these issues is very welcome!

Taking a look at how we use regexec() in our code base, it looks like
it might be better to use regexec_buf() defined in
"git-compat-util.h".

I am not completely sure about that because apply.c has:

    status = regexec(stamp, timestamp, ARRAY_SIZE(m), m, 0);
    if (status) {
        if (status != REG_NOMATCH)
            warning(_("regexec returned %d for input: %s"),
                status, timestamp);
        return 0;
    }

Though the above uses a regex that is defined in apply.c. The regex
doesn't come from the config file.

Actually except the above there is a mix of regexec() and
regexec_buf() in our code base, which are used with only 0, 1 or 2
capture groups, so it is not very clear what should be used.

And anyway I still don't see how we could diagnose when the end user
input requires more captures than we support.

Reply via email to