Jeff King <> writes:

> Thanks, I think this is a real readability improvement in most cases.
> ...
> I tried:
>   perl -i -lpe '
>     s/memcmp\(([^,]+), "(.*?)", (\d+)\)/
>      length($2) == $3 ?
>        qq{!starts_with($1, "$2")} :
>        $&
>     /ge
>   ' $(git ls-files '*.c')
> That comes up with almost the same results as yours, but misses a few
> cases that you caught (in addition to the need to simplify
> !!starts_with()):
>   - memcmp(foo, "bar", strlen("bar"))
>   - strings with interpolation, like "foo\n", which is actually 4
>     characters in length, not 5.
> But there were only a few of those, and I hand-verified them. So I feel
> pretty good that the changes are all correct transformations.
> That leaves the question of whether they are all improvements in
> readability (more on that below).

After reading the patch and the result of your Perl replacement, I'd
agree with the "correctness" but I am not as convinced as you seem
to be about the "real readability improvement in most cases."  "some
cases", perhaps, though.

Taking two random examples from an early and a late parts of the

--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -82,7 +82,7 @@ static int cat_one_file(int opt, const char *exp_type, const 
char *obj_name)
                                enum object_type type;
                                unsigned long size;
                                char *buffer = read_sha1_file(sha1, &type, 
-                               if (memcmp(buffer, "object ", 7) ||
+                               if (!starts_with(buffer, "object ") ||
                                    get_sha1_hex(buffer + 7, blob_sha1))
                                        die("%s not a valid tag", 

diff --git a/transport.c b/transport.c
index ca7bb44..a267822 100644
--- a/transport.c
+++ b/transport.c
@@ -1364,7 +1364,7 @@ static int refs_from_alternate_cb(struct 
alternate_object_database *e,
        while (other[len-1] == '/')
                other[--len] = '\0';
-       if (len < 8 || memcmp(other + len - 8, "/objects", 8))
+       if (len < 8 || !starts_with(other + len - 8, "/objects"))
                return 0;
        /* Is this a git repository with refs? */
        memcpy(other + len - 8, "/refs", 6);

The original hunks show that the code knows and relies on magic
numbers 7 and 8 very clearly and there are rooms for improvement.
The result after the conversion, however, still have the same magic
numbers, but one less of them each.  Doesn't it make it harder to
later spot the patterns to come up with a better abstraction that
does not rely on the magic number?  Especially in the first hunk, we
can spot the repeated 7s in the original that make it glaringly
clear that we might want a better abstraction there, but in the text
after the patch, there is less clue that encourages us to take a
look at that "buffer + 7" further to make sure we do not feed a
wrong string to get_sha1_hex() by mistake when we update the
surrounding code or the data format this codepath parses.

I think grepping for "memcmp()" that compares the same number of
bytes as a constant string, like you showed in that Perl scriptlet,
is a very good first step to locate where we might want to look for
improvements.  I however think that a mechanical replacement of all
such memcmp() with !start_with() is of a dubious value.

After finding the hunk in the cat-file.c shown above, and after
seeing many other similar patterns, one may realize that it is a
good use case for a variant of skip_prefix() that returns boolean,
which we discussed earlier, perhaps like so:

        if (!skip_over(buffer, "object ", &object_name) ||
            get_sha1_hex(object_name, blob_sha1))

and such a solution would be what truly eradicates the reliance of
magic constants that might break by miscounting bytes in string
constants.  I do not think mechanical replacement to !start_with()
is "going in the right direction and reaching a halfway to that
goal".  I honestly think that it instead is taking us into a wrong
direction, without really solving the use of brittle magic constants
and making remaining reliance of them even harder to spot.

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to
More majordomo info at

Reply via email to