Re: [PATCH 18/19] diff: buffer all output if asked to
On Sat, May 13, 2017 at 9:06 PM, Jeff Kingwrote: > On Sat, May 13, 2017 at 09:01:16PM -0700, Stefan Beller wrote: > >> + for (i = 0; i < o->line_buffer_nr; i++); >> + free((void*)o->line_buffer[i].line); > > I haven't looked at the patches yet, but this ";" on the for line is > almost certainly a typo (gcc catches it due to the misleading > indentation of the next line). Grr :/ I have spent hours trying to figure out why this does not work, questioning the design, my mental model of how pointers work and programming in general. /me should get gcc 6 and set -Wmisleading-indentation Thanks, Stefan > > -Peff
Re: [PATCH 18/19] diff: buffer all output if asked to
On Sat, May 13, 2017 at 09:01:16PM -0700, Stefan Beller wrote: > + for (i = 0; i < o->line_buffer_nr; i++); > + free((void*)o->line_buffer[i].line); I haven't looked at the patches yet, but this ";" on the for line is almost certainly a typo (gcc catches it due to the misleading indentation of the next line). -Peff
[PATCH 15/19] diff.c: convert diff_flush to use emit_line_*
In a later patch, I want to propose an option to detect moved lines in a diff, which cannot be done in a one-pass over the diff. Instead we need to go over the whole diff twice, because we cannot detect the first line of the two corresponding lines (+ and -) that got moved. So to prepare the diff machinery for two pass algorithms (i.e. buffer it all up and then operate on the result), move all emissions to places, such that the only emitting function is emit_line_0. This covers diff_flush. Signed-off-by: Stefan Beller--- diff.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/diff.c b/diff.c index 07041a35ad..386b28cf47 100644 --- a/diff.c +++ b/diff.c @@ -4873,7 +4873,9 @@ void diff_flush(struct diff_options *options) term, 1); if (options->stat_sep) { /* attach patch instead of inline */ - fputs(options->stat_sep, options->file); + emit_line(options, NULL, NULL, + options->stat_sep, + strlen(options->stat_sep)); } } -- 2.13.0.18.g183880de0a
[PATCH 05/19] diff.c: emit_line_0 can handle no color setting
In a later patch, I want to propose an option to detect moved lines in a diff, which cannot be done in a one-pass over the diff. Instead we need to go over the whole diff twice, because we cannot detect the first line of the two corresponding lines (+ and -) that got moved. So to prepare the diff machinery for two pass algorithms (i.e. buffer it all up and then operate on the result), move all emissions to places, such that the only emitting function is emit_line_0. In later patches we may pass lines that are not colored to the central function emit_line_0, so we need to emit the color only when it is non-NULL. Signed-off-by: Stefan Beller--- diff.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/diff.c b/diff.c index 381b572d76..48f0fb98dc 100644 --- a/diff.c +++ b/diff.c @@ -532,11 +532,13 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res len--; if (len || sign) { - fputs(set, file); + if (set) + fputs(set, file); if (sign) fputc(sign, file); fwrite(line, len, 1, file); - fputs(reset, file); + if (reset) + fputs(reset, file); } if (has_trailing_carriage_return) fputc('\r', file); -- 2.13.0.18.g183880de0a
[PATCH 04/19] diff.c: factor out diff_flush_patch_all_file_pairs
In a later patch we want to do more things before and after all filepairs are flushed. So factor flushing out all file pairs into its own function that the new code can be plugged in easily. Signed-off-by: Stefan Beller--- diff.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/diff.c b/diff.c index 4269b8dccf..381b572d76 100644 --- a/diff.c +++ b/diff.c @@ -4728,6 +4728,17 @@ void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc) warning(_(rename_limit_advice), varname, needed); } +static void diff_flush_patch_all_file_pairs(struct diff_options *o) +{ + int i; + struct diff_queue_struct *q = _queued_diff; + for (i = 0; i < q->nr; i++) { + struct diff_filepair *p = q->queue[i]; + if (check_pair_status(p)) + diff_flush_patch(p, o); + } +} + void diff_flush(struct diff_options *options) { struct diff_queue_struct *q = _queued_diff; @@ -4825,11 +4836,7 @@ void diff_flush(struct diff_options *options) } } - for (i = 0; i < q->nr; i++) { - struct diff_filepair *p = q->queue[i]; - if (check_pair_status(p)) - diff_flush_patch(p, options); - } + diff_flush_patch_all_file_pairs(options); } if (output_format & DIFF_FORMAT_CALLBACK) -- 2.13.0.18.g183880de0a
[PATCH 13/19] diff.c: convert show_stats to use emit_line_*
In a later patch, I want to propose an option to detect moved lines in a diff, which cannot be done in a one-pass over the diff. Instead we need to go over the whole diff twice, because we cannot detect the first line of the two corresponding lines (+ and -) that got moved. So to prepare the diff machinery for two pass algorithms (i.e. buffer it all up and then operate on the result), move all emissions to places, such that the only emitting function is emit_line_0. We call print_stat_summary from builtin/apply, so we still need the version with a file pointer, so introduce print_stat_summary_0 that uses emit_line_* machinery and keep print_stat_summary with the same arguments around. Signed-off-by: Stefan Beller--- diff.c | 89 ++ diff.h | 4 +-- 2 files changed, 53 insertions(+), 40 deletions(-) diff --git a/diff.c b/diff.c index ffdb728810..91dc045a45 100644 --- a/diff.c +++ b/diff.c @@ -1529,20 +1529,19 @@ static int scale_linear(int it, int width, int max_change) return 1 + (it * (width - 1) / max_change); } -static void show_name(FILE *file, +static void show_name(struct strbuf *out, const char *prefix, const char *name, int len) { - fprintf(file, " %s%-*s |", prefix, len, name); + strbuf_addf(out, " %s%-*s |", prefix, len, name); } -static void show_graph(FILE *file, char ch, int cnt, const char *set, const char *reset) +static void show_graph(struct strbuf *out, char ch, int cnt, const char *set, const char *reset) { if (cnt <= 0) return; - fprintf(file, "%s", set); - while (cnt--) - putc(ch, file); - fprintf(file, "%s", reset); + strbuf_addstr(out, set); + strbuf_addchars(out, ch, cnt); + strbuf_addstr(out, reset); } static void fill_print_name(struct diffstat_file *file) @@ -1566,14 +1565,16 @@ static void fill_print_name(struct diffstat_file *file) file->print_name = pname; } -int print_stat_summary(FILE *fp, int files, int insertions, int deletions) +void print_stat_summary_0(struct diff_options *options, int files, + int insertions, int deletions) { struct strbuf sb = STRBUF_INIT; - int ret; if (!files) { assert(insertions == 0 && deletions == 0); - return fprintf(fp, "%s\n", " 0 files changed"); + strbuf_addstr(, " 0 files changed"); + emit_line(options, NULL, NULL, sb.buf, sb.len); + return; } strbuf_addf(, @@ -1600,9 +1601,17 @@ int print_stat_summary(FILE *fp, int files, int insertions, int deletions) deletions); } strbuf_addch(, '\n'); - ret = fputs(sb.buf, fp); + emit_line(options, NULL, NULL, sb.buf, sb.len); strbuf_release(); - return ret; +} + +void print_stat_summary(FILE *fp, int files, + int insertions, int deletions) +{ + struct diff_options o; + memset(, 0, sizeof(o)); + o.file = fp; + print_stat_summary_0(, files, insertions, deletions); } static void show_stats(struct diffstat_t *data, struct diff_options *options) @@ -1612,13 +1621,13 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) int total_files = data->nr, count; int width, name_width, graph_width, number_width = 0, bin_width = 0; const char *reset, *add_c, *del_c; - const char *line_prefix = ""; int extra_shown = 0; + const char *line_prefix = diff_line_prefix(options); + struct strbuf out = STRBUF_INIT; if (data->nr == 0) return; - line_prefix = diff_line_prefix(options); count = options->stat_count ? options->stat_count : data->nr; reset = diff_get_color_opt(options, DIFF_RESET); @@ -1772,26 +1781,29 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) } if (file->is_binary) { - fprintf(options->file, "%s", line_prefix); - show_name(options->file, prefix, name, len); - fprintf(options->file, " %*s", number_width, "Bin"); + show_name(, prefix, name, len); + strbuf_addf(, " %*s", number_width, "Bin"); if (!added && !deleted) { - putc('\n', options->file); + strbuf_addch(, '\n'); + emit_line(options, NULL, NULL, out.buf, out.len); + strbuf_reset(); continue; } - fprintf(options->file, " %s%"PRIuMAX"%s", + strbuf_addf(, " %s%"PRIuMAX"%s", del_c, deleted, reset); -
[PATCH 09/19] diff.c: convert emit_rewrite_diff to use emit_line_*
In a later patch, I want to propose an option to detect moved lines in a diff, which cannot be done in a one-pass over the diff. Instead we need to go over the whole diff twice, because we cannot detect the first line of the two corresponding lines (+ and -) that got moved. So to prepare the diff machinery for two pass algorithms (i.e. buffer it all up and then operate on the result), move all emissions to places, such that the only emitting function is emit_line_0. This covers emit_rewrite_diff. Signed-off-by: Stefan Beller--- diff.c | 33 ++--- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/diff.c b/diff.c index 8e00010bf4..e4b46fee4f 100644 --- a/diff.c +++ b/diff.c @@ -708,17 +708,17 @@ static void remove_tempfile(void) } } -static void print_line_count(FILE *file, int count) +static void add_line_count(struct strbuf *out, int count) { switch (count) { case 0: - fprintf(file, "0,0"); + strbuf_addstr(out, "0,0"); break; case 1: - fprintf(file, "1"); + strbuf_addstr(out, "1"); break; default: - fprintf(file, "1,%d", count); + strbuf_addf(out, "1,%d", count); break; } } @@ -772,7 +772,7 @@ static void emit_rewrite_diff(const char *name_a, char *data_one, *data_two; size_t size_one, size_two; struct emit_callback ecbdata; - const char *line_prefix = diff_line_prefix(o); + struct strbuf out = STRBUF_INIT; if (diff_mnemonic_prefix && DIFF_OPT_TST(o, REVERSE_DIFF)) { a_prefix = o->b_prefix; @@ -810,20 +810,23 @@ static void emit_rewrite_diff(const char *name_a, ecbdata.lno_in_preimage = 1; ecbdata.lno_in_postimage = 1; + emit_line_fmt(o, metainfo, reset, "--- %s%s\n", a_name.buf, name_a_tab); + emit_line_fmt(o, metainfo, reset, "+++ %s%s\n", b_name.buf, name_b_tab); + lc_a = count_lines(data_one, size_one); lc_b = count_lines(data_two, size_two); - fprintf(o->file, - "%s%s--- %s%s%s\n%s%s+++ %s%s%s\n%s%s@@ -", - line_prefix, metainfo, a_name.buf, name_a_tab, reset, - line_prefix, metainfo, b_name.buf, name_b_tab, reset, - line_prefix, fraginfo); + + strbuf_addstr(, "@@ -"); if (!o->irreversible_delete) - print_line_count(o->file, lc_a); + add_line_count(, lc_a); else - fprintf(o->file, "?,?"); - fprintf(o->file, " +"); - print_line_count(o->file, lc_b); - fprintf(o->file, " @@%s\n", reset); + strbuf_addstr(, "?,?"); + strbuf_addstr(, " +"); + add_line_count(, lc_b); + strbuf_addstr(, " @@\n"); + emit_line(o, fraginfo, reset, out.buf, out.len); + strbuf_release(); + if (lc_a && !o->irreversible_delete) emit_rewrite_lines(, '-', data_one, size_one); if (lc_b) -- 2.13.0.18.g183880de0a
[PATCH 18/19] diff: buffer all output if asked to
Introduce a new option 'use_buffer' in the struct diff_options which controls whether all output is buffered up until all output is available. We'll have two new structs in diff.h, one of them 'buffered_patch_line' will be used to buffer each line, and the other 'buffered_patch_file_pair' will store information about information relevant on a per file basis. The buffered_patch_line will duplicate the memory of the line to buffer as that is easiest to reason about for now. In a future patch we may want to decrease the memory usage by not duplicating all output for buffering but rather we may want to store offsets into the file or in case of hunk descriptions such as the similarity score, we could just store the relevant number and reproduce the text later on. This approach was chosen as a first step because it is quite simple compared to the alternative with less memory footprint. emit_line_0 factors out the emission part into emit_line_emission, and depending on the diff_options->use_buffer the emission will be performed directly when calling emit_line_0 or after the whole process is done, i.e. by buffering we have add the possibility for a second pass over the whole output before doing the actual output. In 6440d34 (2012-03-14, diff: tweak a _copy_ of diff_options with word-diff) we introduced a duplicate diff options struct for word emissions as we may have different regex settings in there. When buffering the output, we need to operate on just one buffer, so we have to copy back the emissions of the word buffer into the main buffer. Unconditionally enable output via buffer in this patch as it yields a great opportunity for testing, i.e. all the diff tests from the test suite pass without having reordering issues (i.e. only parts of the output got buffered, and we forgot to buffer other parts). The test suite passes, which gives confidence that we converted all functions to use emit_line_* for output. Signed-off-by: Stefan Beller--- diff.c | 158 ++--- diff.h | 39 2 files changed, 181 insertions(+), 16 deletions(-) diff --git a/diff.c b/diff.c index 08dcc56bb9..dbab7fb44e 100644 --- a/diff.c +++ b/diff.c @@ -516,29 +516,29 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2, ecbdata->blank_at_eof_in_postimage = (at - l2) + 1; } -static void emit_line_0(struct diff_options *o, const char *set, const char *reset, - int sign, const char *line, int len) +static void emit_buffered_patch_line(struct diff_options *o, +struct buffered_patch_line *e) { - int has_trailing_newline, has_trailing_carriage_return; + int has_trailing_newline, has_trailing_carriage_return, len = e->len; FILE *file = o->file; fputs(diff_line_prefix(o), file); - has_trailing_newline = (len > 0 && line[len-1] == '\n'); + has_trailing_newline = (len > 0 && e->line[len-1] == '\n'); if (has_trailing_newline) len--; - has_trailing_carriage_return = (len > 0 && line[len-1] == '\r'); + has_trailing_carriage_return = (len > 0 && e->line[len-1] == '\r'); if (has_trailing_carriage_return) len--; - if (len || sign) { - if (set) - fputs(set, file); - if (sign) - fputc(sign, file); - fwrite(line, len, 1, file); - if (reset) - fputs(reset, file); + if (len || e->sign) { + if (e->set) + fputs(e->set, file); + if (e->sign) + fputc(e->sign, file); + fwrite(e->line, len, 1, file); + if (e->reset) + fputs(e->reset, file); } if (has_trailing_carriage_return) fputc('\r', file); @@ -546,6 +546,65 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res fputc('\n', file); } +static void emit_buffered_patch_line_ws(struct diff_options *o, + struct buffered_patch_line *e, + const char *ws, unsigned ws_rule) +{ + struct buffered_patch_line s = {e->set, e->reset, "", 0, e->sign}; + + emit_buffered_patch_line(o, ); + ws_check_emit(e->line, e->len, ws_rule, + o->file, e->set, e->reset, ws); +} + +static void process_next_buffered_patch_line(struct diff_options *o, int line_no) +{ + struct buffered_patch_line *e = >line_buffer[line_no]; + + const char *ws = o->current_filepair->ws; + unsigned ws_rule = o->current_filepair->ws_rule; + + switch (e->state) { + case BPL_EMIT_LINE_ASIS: + emit_buffered_patch_line(o, e); + break; + case
[RFC PATCH 00/19] Diff machine: highlight moved lines.
For details on *why* see the commit message of the last commit. The first five patches are slight refactorings to get into good shape, the next patches are funneling all output through emit_line_*. The second last patch introduces an option to buffer up all output before printing, and then the last patch can color up moved lines of code. Any feedback welcome. Thanks, Stefan Stefan Beller (19): diff: readability fix diff: move line ending check into emit_hunk_header diff.c: drop 'nofirst' from emit_line_0 diff.c: factor out diff_flush_patch_all_file_pairs diff.c: emit_line_0 can handle no color setting diff: add emit_line_fmt diff.c: convert fn_out_consume to use emit_line_* diff.c: convert builtin_diff to use emit_line_* diff.c: convert emit_rewrite_diff to use emit_line_* diff.c: convert emit_rewrite_lines to use emit_line_* submodule.c: convert show_submodule_summary to use emit_line_fmt diff.c: convert emit_binary_diff_body to use emit_line_* diff.c: convert show_stats to use emit_line_* diff.c: convert word diffing to use emit_line_* diff.c: convert diff_flush to use emit_line_* diff.c: convert diff_summary to use emit_line_* diff.c: factor out emit_line_ws for coloring whitespaces diff: buffer all output if asked to diff.c: color moved lines differently Documentation/config.txt | 12 +- diff.c | 815 ++--- diff.h | 69 +++- submodule.c| 78 ++--- submodule.h| 9 +- t/t4015-diff-whitespace.sh | 229 + 6 files changed, 960 insertions(+), 252 deletions(-) -- 2.13.0.18.g183880de0a
[PATCH 01/19] diff: readability fix
We already have dereferenced 'p->two' into a local variable 'two'. Use that. Signed-off-by: Stefan Beller--- diff.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/diff.c b/diff.c index 74283d9001..3f5bf8b5a4 100644 --- a/diff.c +++ b/diff.c @@ -3283,8 +3283,8 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o) const char *other; const char *attr_path; - name = p->one->path; - other = (strcmp(name, p->two->path) ? p->two->path : NULL); + name = one->path; + other = (strcmp(name, two->path) ? two->path : NULL); attr_path = name; if (o->prefix_length) strip_prefix(o->prefix_length, , ); -- 2.13.0.18.g183880de0a
[PATCH 12/19] diff.c: convert emit_binary_diff_body to use emit_line_*
In a later patch, I want to propose an option to detect moved lines in a diff, which cannot be done in a one-pass over the diff. Instead we need to go over the whole diff twice, because we cannot detect the first line of the two corresponding lines (+ and -) that got moved. So to prepare the diff machinery for two pass algorithms (i.e. buffer it all up and then operate on the result), move all emissions to places, such that the only emitting function is emit_line_0. This covers emit_binary_diff_body. Signed-off-by: Stefan Beller--- diff.c | 39 ++- 1 file changed, 22 insertions(+), 17 deletions(-) diff --git a/diff.c b/diff.c index 45ec311828..ffdb728810 100644 --- a/diff.c +++ b/diff.c @@ -2233,8 +2233,8 @@ static unsigned char *deflate_it(char *data, return deflated; } -static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two, - const char *prefix) +static void emit_binary_diff_body(struct diff_options *o, + mmfile_t *one, mmfile_t *two) { void *cp; void *delta; @@ -2263,13 +2263,12 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two, } if (delta && delta_size < deflate_size) { - fprintf(file, "%sdelta %lu\n", prefix, orig_size); + emit_line_fmt(o, NULL, NULL, "delta %lu\n", orig_size); free(deflated); data = delta; data_size = delta_size; - } - else { - fprintf(file, "%sliteral %lu\n", prefix, two->size); + } else { + emit_line_fmt(o, NULL, NULL, "literal %lu\n", two->size); free(delta); data = deflated; data_size = deflate_size; @@ -2278,8 +2277,9 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two, /* emit data encoded in base85 */ cp = data; while (data_size) { + int len; int bytes = (52 < data_size) ? 52 : data_size; - char line[70]; + char line[71]; data_size -= bytes; if (bytes <= 26) line[0] = bytes + 'A' - 1; @@ -2287,20 +2287,25 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two, line[0] = bytes - 26 + 'a' - 1; encode_85(line + 1, cp, bytes); cp = (char *) cp + bytes; - fprintf(file, "%s", prefix); - fputs(line, file); - fputc('\n', file); + + len = strlen(line); + line[len++] = '\n'; + line[len] = '\0'; + + emit_line(o, NULL, NULL, line, len); } - fprintf(file, "%s\n", prefix); + emit_line(o, NULL, NULL, "\n", 1); free(data); } -static void emit_binary_diff(FILE *file, mmfile_t *one, mmfile_t *two, -const char *prefix) +static void emit_binary_diff(struct diff_options *o, +mmfile_t *one, mmfile_t *two) { - fprintf(file, "%sGIT binary patch\n", prefix); - emit_binary_diff_body(file, one, two, prefix); - emit_binary_diff_body(file, two, one, prefix); + const char *s = "GIT binary patch\n"; + const int len = strlen(s); + emit_line(o, NULL, NULL, s, len); + emit_binary_diff_body(o, one, two); + emit_binary_diff_body(o, two, one); } int diff_filespec_is_binary(struct diff_filespec *one) @@ -2489,7 +2494,7 @@ static void builtin_diff(const char *name_a, header.buf, header.len); strbuf_reset(); if (DIFF_OPT_TST(o, BINARY)) - emit_binary_diff(o->file, , , line_prefix); + emit_binary_diff(o, , ); else emit_line_fmt(o, NULL, NULL, "Binary files %s and %s differ\n", -- 2.13.0.18.g183880de0a
[PATCH 06/19] diff: add emit_line_fmt
In the following patches we'll convert all printing functions to use the emit_line_* family of functions. Many of the printing functions to be converted are formatted. So offer a formatted function in the emit_line function family as well. Signed-off-by: Stefan Beller--- diff.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/diff.c b/diff.c index 48f0fb98dc..aef159a919 100644 --- a/diff.c +++ b/diff.c @@ -552,6 +552,20 @@ static void emit_line(struct diff_options *o, const char *set, const char *reset emit_line_0(o, set, reset, 0, line, len); } +static void emit_line_fmt(struct diff_options *o, + const char *set, const char *reset, + const char *fmt, ...) +{ + struct strbuf sb = STRBUF_INIT; + va_list ap; + va_start(ap, fmt); + strbuf_vaddf(, fmt, ap); + va_end(ap); + + emit_line(o, set, reset, sb.buf, sb.len); + strbuf_release(); +} + static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len) { if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) && -- 2.13.0.18.g183880de0a
[PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0
In 250f79930d (diff.c: split emit_line() from the first char and the rest of the line, 2009-09-14) we introduced the local variable 'nofirst' that indicates if we have no first sign character. With the given implementation we had to use an extra variable unlike reusing 'first' because the lines first character could be '\0'. Change the meaning of the 'first' argument to not mean the first character of the line, but rather just containing the sign that is prepended to the line. Refactor emit_line to not include the lines first character, but pass the complete line as well as a '\0' sign, which now serves as an indication not to print a sign. With this patch other callers hard code the sign (which are '+', '-', ' ' and '\\') such that we do not run into unexpectedly emitting an error-nous '\0'. The audit of the caller revealed that the sign cannot be '\n' or '\r', so remove that condition for trailing newline or carriage return in the sign; the else part of the condition handles the len==0 perfectly, so we can drop the if/else construct. Signed-off-by: Stefan Beller--- diff.c | 40 +--- 1 file changed, 17 insertions(+), 23 deletions(-) diff --git a/diff.c b/diff.c index c2ed605cd0..4269b8dccf 100644 --- a/diff.c +++ b/diff.c @@ -517,33 +517,24 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2, } static void emit_line_0(struct diff_options *o, const char *set, const char *reset, - int first, const char *line, int len) + int sign, const char *line, int len) { int has_trailing_newline, has_trailing_carriage_return; - int nofirst; FILE *file = o->file; fputs(diff_line_prefix(o), file); - if (len == 0) { - has_trailing_newline = (first == '\n'); - has_trailing_carriage_return = (!has_trailing_newline && - (first == '\r')); - nofirst = has_trailing_newline || has_trailing_carriage_return; - } else { - has_trailing_newline = (len > 0 && line[len-1] == '\n'); - if (has_trailing_newline) - len--; - has_trailing_carriage_return = (len > 0 && line[len-1] == '\r'); - if (has_trailing_carriage_return) - len--; - nofirst = 0; - } + has_trailing_newline = (len > 0 && line[len-1] == '\n'); + if (has_trailing_newline) + len--; + has_trailing_carriage_return = (len > 0 && line[len-1] == '\r'); + if (has_trailing_carriage_return) + len--; - if (len || !nofirst) { + if (len || sign) { fputs(set, file); - if (!nofirst) - fputc(first, file); + if (sign) + fputc(sign, file); fwrite(line, len, 1, file); fputs(reset, file); } @@ -556,7 +547,7 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res static void emit_line(struct diff_options *o, const char *set, const char *reset, const char *line, int len) { - emit_line_0(o, set, reset, line[0], line+1, len-1); + emit_line_0(o, set, reset, 0, line, len); } static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len) @@ -4822,9 +4813,12 @@ void diff_flush(struct diff_options *options) if (output_format & DIFF_FORMAT_PATCH) { if (separator) { - fprintf(options->file, "%s%c", - diff_line_prefix(options), - options->line_termination); + char term[2]; + term[0] = options->line_termination; + term[1] = '\0'; + + emit_line(options, NULL, NULL, + term, 1); if (options->stat_sep) { /* attach patch instead of inline */ fputs(options->stat_sep, options->file); -- 2.13.0.18.g183880de0a
[PATCH 08/19] diff.c: convert builtin_diff to use emit_line_*
In a later patch, I want to propose an option to detect moved lines in a diff, which cannot be done in a one-pass over the diff. Instead we need to go over the whole diff twice, because we cannot detect the first line of the two corresponding lines (+ and -) that got moved. So to prepare the diff machinery for two pass algorithms (i.e. buffer it all up and then operate on the result), move all emissions to places, such that the only emitting function is emit_line_0. This covers builtin_diff. Signed-off-by: Stefan Beller--- diff.c | 43 ++- 1 file changed, 26 insertions(+), 17 deletions(-) diff --git a/diff.c b/diff.c index 93343a9ccc..8e00010bf4 100644 --- a/diff.c +++ b/diff.c @@ -1293,8 +1293,9 @@ static void fn_out_consume(void *priv, char *line, unsigned long len) o->found_changes = 1; if (ecbdata->header) { - fprintf(o->file, "%s", ecbdata->header->buf); - strbuf_reset(ecbdata->header); + emit_line(o, NULL, NULL, + ecbdata->header->buf, ecbdata->header->len); + strbuf_release(ecbdata->header); ecbdata->header = NULL; } @@ -2407,7 +2408,7 @@ static void builtin_diff(const char *name_a, b_two = quote_two(b_prefix, name_b + (*name_b == '/')); lbl[0] = DIFF_FILE_VALID(one) ? a_one : "/dev/null"; lbl[1] = DIFF_FILE_VALID(two) ? b_two : "/dev/null"; - strbuf_addf(, "%s%sdiff --git %s %s%s\n", line_prefix, meta, a_one, b_two, reset); + strbuf_addf(, "%sdiff --git %s %s%s\n", meta, a_one, b_two, reset); if (lbl[0][0] == '/') { /* /dev/null */ strbuf_addf(, "%s%snew file mode %06o%s\n", line_prefix, meta, two->mode, reset); @@ -2439,7 +2440,7 @@ static void builtin_diff(const char *name_a, if (complete_rewrite && (textconv_one || !diff_filespec_is_binary(one)) && (textconv_two || !diff_filespec_is_binary(two))) { - fprintf(o->file, "%s", header.buf); + emit_line(o, NULL, NULL, header.buf, header.len); strbuf_reset(); emit_rewrite_diff(name_a, name_b, one, two, textconv_one, textconv_two, o); @@ -2449,7 +2450,8 @@ static void builtin_diff(const char *name_a, } if (o->irreversible_delete && lbl[1][0] == '/') { - fprintf(o->file, "%s", header.buf); + if (header.len) + emit_line(o, NULL, NULL, header.buf, header.len); strbuf_reset(); goto free_ab_and_return; } else if (!DIFF_OPT_TST(o, TEXT) && @@ -2459,13 +2461,16 @@ static void builtin_diff(const char *name_a, S_ISREG(one->mode) && S_ISREG(two->mode) && !DIFF_OPT_TST(o, BINARY)) { if (!oidcmp(>oid, >oid)) { - if (must_show_header) - fprintf(o->file, "%s", header.buf); + if (must_show_header && header.len) + emit_line(o, NULL, NULL, + header.buf, header.len); goto free_ab_and_return; } - fprintf(o->file, "%s", header.buf); - fprintf(o->file, "%sBinary files %s and %s differ\n", - line_prefix, lbl[0], lbl[1]); + if (header.len) + emit_line(o, NULL, NULL, + header.buf, header.len); + emit_line_fmt(o, 0, 0, "Binary files %s and %s differ\n", + lbl[0], lbl[1]); goto free_ab_and_return; } if (fill_mmfile(, one) < 0 || fill_mmfile(, two) < 0) @@ -2473,17 +2478,21 @@ static void builtin_diff(const char *name_a, /* Quite common confusing case */ if (mf1.size == mf2.size && !memcmp(mf1.ptr, mf2.ptr, mf1.size)) { - if (must_show_header) - fprintf(o->file, "%s", header.buf); + if (must_show_header && header.len) + emit_line(o, NULL, NULL, + header.buf, header.len); goto free_ab_and_return; } - fprintf(o->file, "%s", header.buf); + if (header.len) + emit_line(o, NULL, NULL, + header.buf, header.len); strbuf_reset(); if (DIFF_OPT_TST(o, BINARY)) emit_binary_diff(o->file,
[PATCH 19/19] diff.c: color moved lines differently
When there is a lot of code moved around such as in 11979b9 (2005-11-18, "http.c: reorder to avoid compilation failure.") for example, the review process is quite hard, as it is not mentally challenging. It is a rather tedious process, that gets boring quickly. However you still need to read through all of the code to make sure the moved lines are there as supposed. While it is trivial to color up a patch like the following $ git diff diff --git a/file2.c b/file2.c index 9163a0f..8e66dc0 100644 --- a/file2.c +++ b/file2.c @@ -3,13 +3,6 @@ void *xmemdupz(const void *data, size_t len) return memcpy(xmallocz(len), data, len); } -int secure_foo(struct user *u) -{ - if (!u->is_allowed_foo) - return; - foo(u); -} - char *xstrndup(const char *str, size_t len) { char *p = memchr(str, '\0', len); diff --git a/test.c b/test.c index a95e6fe..81eb0eb 100644 --- a/test.c +++ b/test.c @@ -18,6 +18,13 @@ ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset) return total; } +int secure_foo(struct user *u) +{ + if (!u->is_allowed_foo) + return; + foo(u); +} + int xdup(int fd) { int ret = dup(fd); as in this patch all lines that add or remove lines should be colored in the new color that indicates moved lines. However the intention of this patch is to aid reviewers to spotting permutations in the moved code. So consider the following malicious move: diff --git a/file2.c b/file2.c index 9163a0f..8e66dc0 100644 --- a/file2.c +++ b/file2.c @@ -3,13 +3,6 @@ void *xmemdupz(const void *data, size_t len) return memcpy(xmallocz(len), data, len); } -int secure_foo(struct user *u) -{ - if (!u->is_allowed_foo) - return; - foo(u); -} - char *xstrndup(const char *str, size_t len) { char *p = memchr(str, '\0', len); diff --git a/test.c b/test.c index a95e6fe..a679c40 100644 --- a/test.c +++ b/test.c @@ -18,6 +18,13 @@ ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset) return total; } +int secure_foo(struct user *u) +{ + foo(u); + if (!u->is_allowed_foo) + return; +} + int xdup(int fd) { int ret = dup(fd); If the moved code is larger, it is easier to hide some permutation in the code, which is why we would not want to color all lines as "moved" in this case. So we do not just need to color lines differently that are added and removed in the same diff, we need to tweak the algorithm a bit more. As the reviewers attention should be brought to the places, where the difference is introduced to the moved code, we cannot just have one new color for all of moved code. First I implemented an alternative design, which would show a moved hunk in one color, and its boundaries in another color. This idea was error prone as it inspected each line and its neighboring lines to determine if the line was (a) moved and (b) if was deep inside a hunk by having matching neighboring lines. This is unreliable as the we can construct hunks which have equal neighbors that just exceed the number of lines inspected. See one of the tests as an example. Instead this provides a dynamic programming greedy algorithm that finds the largest moved hunk and then switches color to the alternative color for the next hunk. By doing this any permutation is recognized and displayed. That implies that there is no dedicated boundary or inside-hunk color, but instead we'll have just two colors alternating for hunks. It would be a bit more UX friendly if the two corresponding hunks (of added and deleted lines) for one move would get the same color id. (Both get "regular moved" or "alternative moved"). This problem is deferred to a later patch for now. Algorithm-by: Jonathan TanSigned-off-by: Stefan Beller --- Documentation/config.txt | 12 +- diff.c | 265 ++--- diff.h | 21 +++- t/t4015-diff-whitespace.sh | 229 +++ 4 files changed, 506 insertions(+), 21 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 475e874d51..90403c06e3 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1051,14 +1051,22 @@ This does not affect linkgit:git-format-patch[1] or the 'git-diff-{asterisk}' plumbing commands. Can be overridden on the command line with the `--color[=]` option. +color.moved:: + A boolean value, whether a diff should color moved lines + differently. The moved lines are searched for in the diff only. + Duplicated lines from somewhere in the
[PATCH 16/19] diff.c: convert diff_summary to use emit_line_*
In a later patch, I want to propose an option to detect moved lines in a diff, which cannot be done in a one-pass over the diff. Instead we need to go over the whole diff twice, because we cannot detect the first line of the two corresponding lines (+ and -) that got moved. So to prepare the diff machinery for two pass algorithms (i.e. buffer it all up and then operate on the result), move all emissions to places, such that the only emitting function is emit_line_0. This covers diff_summary. Signed-off-by: Stefan Beller--- diff.c | 62 +- 1 file changed, 33 insertions(+), 29 deletions(-) diff --git a/diff.c b/diff.c index 386b28cf47..899dc69dff 100644 --- a/diff.c +++ b/diff.c @@ -4504,67 +4504,71 @@ static void flush_one_pair(struct diff_filepair *p, struct diff_options *opt) } } -static void show_file_mode_name(FILE *file, const char *newdelete, struct diff_filespec *fs) +static void show_file_mode_name(struct diff_options *opt, const char *newdelete, struct diff_filespec *fs) { + struct strbuf sb = STRBUF_INIT; if (fs->mode) - fprintf(file, " %s mode %06o ", newdelete, fs->mode); + strbuf_addf(, " %s mode %06o ", newdelete, fs->mode); else - fprintf(file, " %s ", newdelete); - write_name_quoted(fs->path, file, '\n'); + strbuf_addf(, " %s ", newdelete); + + quote_c_style(fs->path, , NULL, 0); + strbuf_addch(, '\n'); + emit_line(opt, NULL, NULL, sb.buf, sb.len); + strbuf_release(); } -static void show_mode_change(FILE *file, struct diff_filepair *p, int show_name, - const char *line_prefix) +static void show_mode_change(struct diff_options *opt, struct diff_filepair *p, + int show_name) { if (p->one->mode && p->two->mode && p->one->mode != p->two->mode) { - fprintf(file, "%s mode change %06o => %06o%c", line_prefix, p->one->mode, - p->two->mode, show_name ? ' ' : '\n'); + struct strbuf sb = STRBUF_INIT; if (show_name) { - write_name_quoted(p->two->path, file, '\n'); + strbuf_addch(, ' '); + quote_c_style(p->two->path, , NULL, 0); } + emit_line_fmt(opt, NULL, NULL, + " mode change %06o => %06o%s\n", + p->one->mode, p->two->mode, + show_name ? sb.buf : ""); + strbuf_release(); } } -static void show_rename_copy(FILE *file, const char *renamecopy, struct diff_filepair *p, - const char *line_prefix) +static void show_rename_copy(struct diff_options *opt, const char *renamecopy, + struct diff_filepair *p) { char *names = pprint_rename(p->one->path, p->two->path); - - fprintf(file, " %s %s (%d%%)\n", renamecopy, names, similarity_index(p)); + emit_line_fmt(opt, NULL, NULL, " %s %s (%d%%)\n", renamecopy, names, similarity_index(p)); free(names); - show_mode_change(file, p, 0, line_prefix); + show_mode_change(opt, p, 0); } static void diff_summary(struct diff_options *opt, struct diff_filepair *p) { - FILE *file = opt->file; - const char *line_prefix = diff_line_prefix(opt); - switch(p->status) { case DIFF_STATUS_DELETED: - fputs(line_prefix, file); - show_file_mode_name(file, "delete", p->one); + show_file_mode_name(opt, "delete", p->one); break; case DIFF_STATUS_ADDED: - fputs(line_prefix, file); - show_file_mode_name(file, "create", p->two); + show_file_mode_name(opt, "create", p->two); break; case DIFF_STATUS_COPIED: - fputs(line_prefix, file); - show_rename_copy(file, "copy", p, line_prefix); + show_rename_copy(opt, "copy", p); break; case DIFF_STATUS_RENAMED: - fputs(line_prefix, file); - show_rename_copy(file, "rename", p, line_prefix); + show_rename_copy(opt, "rename", p); break; default: if (p->score) { - fprintf(file, "%s rewrite ", line_prefix); - write_name_quoted(p->two->path, file, ' '); - fprintf(file, "(%d%%)\n", similarity_index(p)); + struct strbuf sb = STRBUF_INIT; + strbuf_addstr(, " rewrite "); + quote_c_style(p->two->path, , NULL, 0); + strbuf_addf(, " (%d%%)\n", similarity_index(p)); + emit_line(opt, NULL, NULL, sb.buf, sb.len); } - show_mode_change(file, p, !p->score, line_prefix); +
[PATCH 10/19] diff.c: convert emit_rewrite_lines to use emit_line_*
In a later patch, I want to propose an option to detect moved lines in a diff, which cannot be done in a one-pass over the diff. Instead we need to go over the whole diff twice, because we cannot detect the first line of the two corresponding lines (+ and -) that got moved. So to prepare the diff machinery for two pass algorithms (i.e. buffer it all up and then operate on the result), move all emissions to places, such that the only emitting function is emit_line_0. This covers emit_rewrite_lines. Signed-off-by: Stefan Beller--- diff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diff.c b/diff.c index e4b46fee4f..369c804f03 100644 --- a/diff.c +++ b/diff.c @@ -748,7 +748,7 @@ static void emit_rewrite_lines(struct emit_callback *ecb, if (!endp) { const char *context = diff_get_color(ecb->color_diff, DIFF_CONTEXT); - putc('\n', ecb->opt->file); + emit_line(ecb->opt, NULL, NULL, "\n", 1); emit_line_0(ecb->opt, context, reset, '\\', nneof, strlen(nneof)); } -- 2.13.0.18.g183880de0a
[PATCH 14/19] diff.c: convert word diffing to use emit_line_*
In a later patch, I want to propose an option to detect moved lines in a diff, which cannot be done in a one-pass over the diff. Instead we need to go over the whole diff twice, because we cannot detect the first line of the two corresponding lines (+ and -) that got moved. So to prepare the diff machinery for two pass algorithms (i.e. buffer it all up and then operate on the result), move all emissions to places, such that the only emitting function is emit_line_0. This covers all code related to diffing words. Signed-off-by: Stefan Beller--- diff.c | 56 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/diff.c b/diff.c index 91dc045a45..07041a35ad 100644 --- a/diff.c +++ b/diff.c @@ -886,37 +886,38 @@ struct diff_words_data { struct diff_words_style *style; }; -static int fn_out_diff_words_write_helper(FILE *fp, +static int fn_out_diff_words_write_helper(struct diff_options *o, struct diff_words_style_elem *st_el, const char *newline, size_t count, const char *buf, const char *line_prefix) { - int print = 0; + struct strbuf sb = STRBUF_INIT; while (count) { char *p = memchr(buf, '\n', count); - if (print) - fputs(line_prefix, fp); + if (p != buf) { - if (st_el->color && fputs(st_el->color, fp) < 0) - return -1; - if (fputs(st_el->prefix, fp) < 0 || - fwrite(buf, p ? p - buf : count, 1, fp) != 1 || - fputs(st_el->suffix, fp) < 0) - return -1; - if (st_el->color && *st_el->color - && fputs(GIT_COLOR_RESET, fp) < 0) - return -1; + if (st_el->color) + strbuf_addstr(, st_el->color); + strbuf_addstr(, st_el->prefix); + strbuf_add(, buf, p ? p - buf : count); + strbuf_addstr(, st_el->suffix); + if (st_el->color && *st_el->color) + strbuf_addstr(, GIT_COLOR_RESET); } if (!p) - return 0; - if (fputs(newline, fp) < 0) - return -1; + goto out; + strbuf_addstr(, newline); + emit_line(o, NULL, NULL, sb.buf, sb.len); + strbuf_reset(); count -= p + 1 - buf; buf = p + 1; - print = 1; } +out: + if (sb.len) + emit_line(o, NULL, NULL, sb.buf, sb.len); + strbuf_release(); return 0; } @@ -994,25 +995,25 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len) } else plus_begin = plus_end = diff_words->plus.orig[plus_first].end; - if (color_words_output_graph_prefix(diff_words)) { - fputs(line_prefix, diff_words->opt->file); - } + if (color_words_output_graph_prefix(diff_words)) + emit_line(diff_words->opt, NULL, NULL, "", 0); + if (diff_words->current_plus != plus_begin) { - fn_out_diff_words_write_helper(diff_words->opt->file, + fn_out_diff_words_write_helper(diff_words->opt, >ctx, style->newline, plus_begin - diff_words->current_plus, diff_words->current_plus, line_prefix); if (*(plus_begin - 1) == '\n') - fputs(line_prefix, diff_words->opt->file); + emit_line(diff_words->opt, NULL, NULL, "", 0); } if (minus_begin != minus_end) { - fn_out_diff_words_write_helper(diff_words->opt->file, + fn_out_diff_words_write_helper(diff_words->opt, >old, style->newline, minus_end - minus_begin, minus_begin, line_prefix); } if (plus_begin != plus_end) { - fn_out_diff_words_write_helper(diff_words->opt->file, + fn_out_diff_words_write_helper(diff_words->opt, >new, style->newline, plus_end - plus_begin, plus_begin, line_prefix); @@ -1109,8 +1110,7 @@ static void diff_words_show(struct diff_words_data *diff_words) /* special case: only removal */ if (!diff_words->plus.text.size) { - fputs(line_prefix, diff_words->opt->file); -
[PATCH 11/19] submodule.c: convert show_submodule_summary to use emit_line_fmt
In a later patch, I want to propose an option to detect moved lines in a diff, which cannot be done in a one-pass over the diff. Instead we need to go over the whole diff twice, because we cannot detect the first line of the two corresponding lines (+ and -) that got moved. So to prepare the diff machinery for two pass algorithms (i.e. buffer it all up and then operate on the result), move all emissions to places, such that the only emitting function is emit_line_0. This prepares the code for submodules to go through the emit_line function. Signed-off-by: Stefan Beller--- diff.c | 18 +++--- diff.h | 5 submodule.c | 78 ++--- submodule.h | 9 +++ 4 files changed, 55 insertions(+), 55 deletions(-) diff --git a/diff.c b/diff.c index 369c804f03..45ec311828 100644 --- a/diff.c +++ b/diff.c @@ -546,15 +546,15 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res fputc('\n', file); } -static void emit_line(struct diff_options *o, const char *set, const char *reset, - const char *line, int len) +void emit_line(struct diff_options *o, const char *set, const char *reset, + const char *line, int len) { emit_line_0(o, set, reset, 0, line, len); } -static void emit_line_fmt(struct diff_options *o, - const char *set, const char *reset, - const char *fmt, ...) +void emit_line_fmt(struct diff_options *o, + const char *set, const char *reset, + const char *fmt, ...) { struct strbuf sb = STRBUF_INIT; va_list ap; @@ -2379,8 +2379,7 @@ static void builtin_diff(const char *name_a, (!two->mode || S_ISGITLINK(two->mode))) { const char *del = diff_get_color_opt(o, DIFF_FILE_OLD); const char *add = diff_get_color_opt(o, DIFF_FILE_NEW); - show_submodule_summary(o->file, one->path ? one->path : two->path, - line_prefix, + show_submodule_summary(o, one->path ? one->path : two->path, >oid, >oid, two->dirty_submodule, meta, del, add, reset); @@ -2390,11 +2389,10 @@ static void builtin_diff(const char *name_a, (!two->mode || S_ISGITLINK(two->mode))) { const char *del = diff_get_color_opt(o, DIFF_FILE_OLD); const char *add = diff_get_color_opt(o, DIFF_FILE_NEW); - show_submodule_inline_diff(o->file, one->path ? one->path : two->path, - line_prefix, + show_submodule_inline_diff(o, one->path ? one->path : two->path, >oid, >oid, two->dirty_submodule, - meta, del, add, reset, o); + meta, del, add, reset); return; } diff --git a/diff.h b/diff.h index 5be1ee77a7..addebd5a0f 100644 --- a/diff.h +++ b/diff.h @@ -188,6 +188,11 @@ struct diff_options { int diff_path_counter; }; +void emit_line_fmt(struct diff_options *o, const char *set, const char *reset, + const char *fmt, ...); +void emit_line(struct diff_options *o, const char *set, const char *reset, + const char *line, int len); + enum color_diff { DIFF_RESET = 0, DIFF_CONTEXT = 1, diff --git a/submodule.c b/submodule.c index d3299e29c0..cfad469a2f 100644 --- a/submodule.c +++ b/submodule.c @@ -362,8 +362,8 @@ static int prepare_submodule_summary(struct rev_info *rev, const char *path, return prepare_revision_walk(rev); } -static void print_submodule_summary(struct rev_info *rev, FILE *f, - const char *line_prefix, +static void print_submodule_summary(struct rev_info *rev, + struct diff_options *o, const char *del, const char *add, const char *reset) { static const char format[] = " %m %s"; @@ -375,18 +375,12 @@ static void print_submodule_summary(struct rev_info *rev, FILE *f, ctx.date_mode = rev->date_mode; ctx.output_encoding = get_log_output_encoding(); strbuf_setlen(, 0); - strbuf_addstr(, line_prefix); - if (commit->object.flags & SYMMETRIC_LEFT) { - if (del) - strbuf_addstr(, del); - } - else if (add) - strbuf_addstr(, add); format_commit_message(commit, format, , ); - if (reset) - strbuf_addstr(, reset); strbuf_addch(, '\n'); - fprintf(f, "%s", sb.buf); + if (commit->object.flags & SYMMETRIC_LEFT) + emit_line(o, del,
[PATCH 07/19] diff.c: convert fn_out_consume to use emit_line_*
In a later patch, I want to propose an option to detect moved lines in a diff, which cannot be done in a one-pass over the diff. Instead we need to go over the whole diff twice, because we cannot detect the first line of the two corresponding lines (+ and -) that got moved. So to prepare the diff machinery for two pass algorithms (i.e. buffer it all up and then operate on the result), move all emissions to places, such that the only emitting function is emit_line_0. This covers the parts of fn_out_consume. Signed-off-by: Stefan Beller--- diff.c | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/diff.c b/diff.c index aef159a919..93343a9ccc 100644 --- a/diff.c +++ b/diff.c @@ -1289,7 +1289,6 @@ static void fn_out_consume(void *priv, char *line, unsigned long len) const char *context = diff_get_color(ecbdata->color_diff, DIFF_CONTEXT); const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET); struct diff_options *o = ecbdata->opt; - const char *line_prefix = diff_line_prefix(o); o->found_changes = 1; @@ -1301,14 +1300,12 @@ static void fn_out_consume(void *priv, char *line, unsigned long len) if (ecbdata->label_path[0]) { const char *name_a_tab, *name_b_tab; - name_a_tab = strchr(ecbdata->label_path[0], ' ') ? "\t" : ""; name_b_tab = strchr(ecbdata->label_path[1], ' ') ? "\t" : ""; - - fprintf(o->file, "%s%s--- %s%s%s\n", - line_prefix, meta, ecbdata->label_path[0], reset, name_a_tab); - fprintf(o->file, "%s%s+++ %s%s%s\n", - line_prefix, meta, ecbdata->label_path[1], reset, name_b_tab); + emit_line_fmt(o, meta, reset, "--- %s%s\n", + ecbdata->label_path[0], name_a_tab); + emit_line_fmt(o, meta, reset, "+++ %s%s\n", + ecbdata->label_path[1], name_b_tab); ecbdata->label_path[0] = ecbdata->label_path[1] = NULL; } @@ -1349,7 +1346,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len) diff_words_flush(ecbdata); if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) { emit_line(o, context, reset, line, len); - fputs("~\n", o->file); + emit_line(o, NULL, NULL, "~\n", 2); } else { /* * Skip the prefix character, if any. With -- 2.13.0.18.g183880de0a
[PATCH 17/19] diff.c: factor out emit_line_ws for coloring whitespaces
Introduce a helper that calls ws_check_emit. We'll have it as a helper as in a later patch we'll add buffering in there. In a later patch we want to buffer up all output and to do that we'll need to keep around information for outputting a line such as the whitespace information. We choose to put this information into a new struct 'buffered_filepair', which later will be persisting longer than builtin_diff. Signed-off-by: Stefan Beller--- diff.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/diff.c b/diff.c index 899dc69dff..08dcc56bb9 100644 --- a/diff.c +++ b/diff.c @@ -552,6 +552,16 @@ void emit_line(struct diff_options *o, const char *set, const char *reset, emit_line_0(o, set, reset, 0, line, len); } +static void emit_line_ws(struct diff_options *o, +const char *set, const char *reset, int sign, +const char *line, int len, +const char *ws, unsigned ws_rule) +{ + emit_line_0(o, set, reset, sign, "", 0); + ws_check_emit(line, len, ws_rule, + o->file, set, reset, ws); +} + void emit_line_fmt(struct diff_options *o, const char *set, const char *reset, const char *fmt, ...) @@ -598,12 +608,10 @@ static void emit_line_checked(const char *reset, else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len)) /* Blank line at EOF - paint '+' as well */ emit_line_0(ecbdata->opt, ws, reset, sign, line, len); - else { + else /* Emit just the prefix, then the rest. */ - emit_line_0(ecbdata->opt, set, reset, sign, "", 0); - ws_check_emit(line, len, ecbdata->ws_rule, - ecbdata->opt->file, set, reset, ws); - } + emit_line_ws(ecbdata->opt, set, reset, sign, line, len, +ws, ecbdata->ws_rule); } static void emit_add_line(const char *reset, -- 2.13.0.18.g183880de0a
[PATCH 02/19] diff: move line ending check into emit_hunk_header
In a later patch, I want to propose an option to detect moved lines in a diff, which cannot be done in a one-pass over the diff. Instead we need to go over the whole diff twice, because we cannot detect the first line of the two corresponding lines (+ and -) that got moved. So to prepare the diff machinery for two pass algorithms (i.e. buffer it all up and then operate on the result), move all emissions to places, such that the only emitting function is emit_line_0. This patch moves code that is conceptually part of emit_hunk_header, but was using output in fn_out_consume, back to emit_hunk_header. Meanwhile simplify it by using a function that is designed for it. Signed-off-by: Stefan Beller--- diff.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/diff.c b/diff.c index 3f5bf8b5a4..c2ed605cd0 100644 --- a/diff.c +++ b/diff.c @@ -677,6 +677,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata, } strbuf_add(, line + len, org_len - len); + strbuf_complete_line(); + emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len); strbuf_release(); } @@ -1315,8 +1317,6 @@ static void fn_out_consume(void *priv, char *line, unsigned long len) len = sane_truncate_line(ecbdata, line, len); find_lno(line, ecbdata); emit_hunk_header(ecbdata, line, len); - if (line[len-1] != '\n') - putc('\n', o->file); return; } -- 2.13.0.18.g183880de0a
Re: checkout -b remotes/origin/ should not work
On Sat, May 13, 2017 at 08:52:37PM -0700, Stefan Beller wrote: > NEEDSWORK: > > checkout -b remotes/origin/ should not work, unless force is > given (maybe?) > > (I just run into that, now I have a remote tracking branch that points > at my detached HEAD. Oh well.) To be pedantic, you have a local branch with a funny name that points to your detached HEAD. :) I think this problem extends beyond "remotes/". The worst is: git checkout -b HEAD but there are many confusing variants: git checkout -b refs/heads/foo git checkout -b tags/v1.0 etc. Those are all perfectly legal names, but almost certainly not what the user intended. I think the plumbing should continue to allow them, but I wouldn't object to some common-sense think-o protections in the "checkout -b" plumbing (especially if it could be disabled for power users). -Peff
Re: [PATCH] interpret-trailers: obey scissors lines
On Sat, May 13, 2017 at 08:39:23PM -0700, Brian Malehorn wrote: > If a commit message is being editted as "verbose", it will contain a > scissors string ("-- >8 --") and a diff: > > my subject > > # >8 > # Do not touch the line above. > # Everything below will be removed. > diff --git a/foo.txt b/foo.txt > index 5716ca5..7601807 100644 > --- a/foo.txt > +++ b/foo.txt > @@ -1 +1 @@ > -bar > +baz > > interpret-trailers doesn't interpret the scissors and therefore places > trailer information after the diff. A simple reproduction is: > > git config commit.verbose true > GIT_EDITOR='git interpret-trailers --in-place --trailer Acked-by:me' \ > git commit --amend > > This commit resolves the issue by teaching "git interpret-trailers" to > obey scissors the same way "git commit" does. Overall, this patch looks good to me. A few comments below. The commit message explains the situation much better than the original. > diff --git a/builtin/commit.c b/builtin/commit.c > index 2de5f6cc6..2ce9c339d 100644 > --- a/builtin/commit.c > +++ b/builtin/commit.c > @@ -1735,7 +1735,8 @@ int cmd_commit(int argc, const char **argv, const char > *prefix) > > if (verbose || /* Truncate the message just before the diff, if any. */ > cleanup_mode == CLEANUP_SCISSORS) > - wt_status_truncate_message_at_cut_line(); > + strbuf_setlen(, > + wt_status_last_nonscissors_index(sb.buf, sb.len)); This hunk surprised me at first (that we would need to touch commit.c at all), but the refactoring makes sense. > @@ -1662,8 +1663,9 @@ int ignore_non_trailer(const char *buf, size_t len) > int boc = 0; > int bol = 0; > int in_old_conflicts_block = 0; > + size_t cutoff = wt_status_last_nonscissors_index(buf, len); > > - while (bol < len) { > + while (bol < cutoff) { > const char *next_line = memchr(buf + bol, '\n', len - bol); > > if (!next_line) > @@ -1689,5 +1691,5 @@ int ignore_non_trailer(const char *buf, size_t len) > } > bol = next_line - buf; > } > - return boc ? len - boc : 0; > + return boc ? len - boc : len - cutoff; > } The change to interpret-trailers here ended up delightfully simple (and looks right to me). > diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh > index 4dd1d7c52..d88d4a4ff 100755 > --- a/t/t7513-interpret-trailers.sh > +++ b/t/t7513-interpret-trailers.sh > @@ -1258,4 +1258,21 @@ test_expect_success 'with no command and no key' ' > test_cmp expected actual > ' > > +test_expect_success 'with scissors' ' > + cat >expected <<-EOF && > + my subject > + > + review: Brian > + sign: A U Thor> + # >8 > + ignore this > + EOF Two minor style nits. One, we'd usually use "\EOF" here unless you really do want to interpolate inside the here document. And two, we usually indent the contents to the same level as the outer cat/EOF pair (I actually don't mind at all how yours looks, but I just happened to notice that it is slightly unlike our usual style). > diff --git a/wt-status.c b/wt-status.c > index 4bb46781c..8b807d11f 100644 > --- a/wt-status.c > +++ b/wt-status.c > @@ -883,17 +883,18 @@ static void wt_longstatus_print_other(struct wt_status > *s, > status_printf_ln(s, GIT_COLOR_NORMAL, "%s", ""); > } > > -void wt_status_truncate_message_at_cut_line(struct strbuf *buf) > +size_t wt_status_last_nonscissors_index(const char *s, size_t len) I can see how the refactor changes this function (and it makes sense), but the "last nonscissors index" seems like a funny term. It is really the length, isn't, and therefore one past the last nonscissors index (or another way of putting it: it's the first index of the scissors). I wonder if it makes sense to call it "length". Another way to think of it is still as a truncation. Our strip_suffix() helper behaves quite similarly to this (not actually writing into the buffer, but returning the new length). Perhaps something like "wt_status_strip_scissors" would work. -Peff
checkout -b remotes/origin/ should not work
NEEDSWORK: checkout -b remotes/origin/ should not work, unless force is given (maybe?) (I just run into that, now I have a remote tracking branch that points at my detached HEAD. Oh well.)
Re: [PATCH 0/3] interpret-trailers + commit -v bugfix
> As I said, I'm a little iffy on doing this unconditionally, but it may > be the least-bad solution. I'd just worry about collateral damage to > somebody who doesn't use commit.verbose, but has something scissors-like > in their commit message. > > If you were to switch out is_scissors_line() for checking the exact > cut_line[] from wt-status.c, I think that would be a big improvement. > We'd still have the possibility of a false positive, but it would be > much less likely in practice. Yes, you're probably right. Using is_scissors_line() was the path of least resistance to fix my bug, but wasn't really the Right Thing. I've made a new patch that uses the wt-status.c code to determine scissors lines. "git interpret-trailers" now uses the same logic as "git commit". This patch replaces the original 3. And yeah, this is yet another heuristic in interpret-trailers aimed at git commit messages. But it's hardly the first heuristic we've added, and I'd say it makes more sense for interpret-trailers and commit to parse the same format.
[PATCH] interpret-trailers: obey scissors lines
If a commit message is being editted as "verbose", it will contain a scissors string ("-- >8 --") and a diff: my subject # >8 # Do not touch the line above. # Everything below will be removed. diff --git a/foo.txt b/foo.txt index 5716ca5..7601807 100644 --- a/foo.txt +++ b/foo.txt @@ -1 +1 @@ -bar +baz interpret-trailers doesn't interpret the scissors and therefore places trailer information after the diff. A simple reproduction is: git config commit.verbose true GIT_EDITOR='git interpret-trailers --in-place --trailer Acked-by:me' \ git commit --amend This commit resolves the issue by teaching "git interpret-trailers" to obey scissors the same way "git commit" does. --- builtin/commit.c | 3 ++- commit.c | 6 -- t/t7513-interpret-trailers.sh | 17 + wt-status.c | 11 ++- wt-status.h | 2 +- 5 files changed, 30 insertions(+), 9 deletions(-) diff --git a/builtin/commit.c b/builtin/commit.c index 2de5f6cc6..2ce9c339d 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1735,7 +1735,8 @@ int cmd_commit(int argc, const char **argv, const char *prefix) if (verbose || /* Truncate the message just before the diff, if any. */ cleanup_mode == CLEANUP_SCISSORS) - wt_status_truncate_message_at_cut_line(); + strbuf_setlen(, + wt_status_last_nonscissors_index(sb.buf, sb.len)); if (cleanup_mode != CLEANUP_NONE) strbuf_stripspace(, cleanup_mode == CLEANUP_ALL); diff --git a/commit.c b/commit.c index fab826973..5d791b703 100644 --- a/commit.c +++ b/commit.c @@ -11,6 +11,7 @@ #include "commit-slab.h" #include "prio-queue.h" #include "sha1-lookup.h" +#include "wt-status.h" static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **); @@ -1662,8 +1663,9 @@ int ignore_non_trailer(const char *buf, size_t len) int boc = 0; int bol = 0; int in_old_conflicts_block = 0; + size_t cutoff = wt_status_last_nonscissors_index(buf, len); - while (bol < len) { + while (bol < cutoff) { const char *next_line = memchr(buf + bol, '\n', len - bol); if (!next_line) @@ -1689,5 +1691,5 @@ int ignore_non_trailer(const char *buf, size_t len) } bol = next_line - buf; } - return boc ? len - boc : 0; + return boc ? len - boc : len - cutoff; } diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh index 4dd1d7c52..d88d4a4ff 100755 --- a/t/t7513-interpret-trailers.sh +++ b/t/t7513-interpret-trailers.sh @@ -1258,4 +1258,21 @@ test_expect_success 'with no command and no key' ' test_cmp expected actual ' +test_expect_success 'with scissors' ' + cat >expected <<-EOF && + my subject + + review: Brian + sign: A U Thor+ # >8 + ignore this + EOF + git interpret-trailers --trailer review:Brian >actual <<-EOF && + my subject + # >8 + ignore this + EOF + test_cmp expected actual +' + test_done diff --git a/wt-status.c b/wt-status.c index 4bb46781c..8b807d11f 100644 --- a/wt-status.c +++ b/wt-status.c @@ -883,17 +883,18 @@ static void wt_longstatus_print_other(struct wt_status *s, status_printf_ln(s, GIT_COLOR_NORMAL, "%s", ""); } -void wt_status_truncate_message_at_cut_line(struct strbuf *buf) +size_t wt_status_last_nonscissors_index(const char *s, size_t len) { const char *p; struct strbuf pattern = STRBUF_INIT; strbuf_addf(, "\n%c %s", comment_line_char, cut_line); - if (starts_with(buf->buf, pattern.buf + 1)) - strbuf_setlen(buf, 0); - else if ((p = strstr(buf->buf, pattern.buf))) - strbuf_setlen(buf, p - buf->buf + 1); + if (starts_with(s, pattern.buf + 1)) + len = 0; + else if ((p = strstr(s, pattern.buf))) + len = p - s + 1; strbuf_release(); + return len; } void wt_status_add_cut_line(FILE *fp) diff --git a/wt-status.h b/wt-status.h index 54fec7703..36a21b492 100644 --- a/wt-status.h +++ b/wt-status.h @@ -112,7 +112,7 @@ struct wt_status_state { unsigned char cherry_pick_head_sha1[20]; }; -void wt_status_truncate_message_at_cut_line(struct strbuf *); +size_t wt_status_last_nonscissors_index(const char *s, size_t len); void wt_status_add_cut_line(FILE *fp); void wt_status_prepare(struct wt_status *s); void wt_status_print(struct wt_status *s); -- 2.12.3.9.g050893b
[RFC PATCH v2 22/22] blame: create entry prepend function in libgit
Signed-off-by: Jeff Smith--- blame.c | 16 blame.h | 2 ++ builtin/blame.c | 11 +-- 3 files changed, 19 insertions(+), 10 deletions(-) diff --git a/blame.c b/blame.c index f6c9cb7..00404b9 100644 --- a/blame.c +++ b/blame.c @@ -1845,3 +1845,19 @@ void setup_scoreboard(struct blame_scoreboard *sb, const char *path, struct blam if (orig) *orig = o; } + + + +struct blame_entry *blame_entry_prepend(struct blame_entry *head, + long start, long end, + struct blame_origin *o) +{ + struct blame_entry *new_head = xcalloc(1, sizeof(struct blame_entry)); + new_head->lno = start; + new_head->num_lines = end - start; + new_head->suspect = o; + new_head->s_lno = start; + new_head->next = head; + blame_origin_incref(o); + return new_head; +} diff --git a/blame.h b/blame.h index 76fd8ef..a6c915c 100644 --- a/blame.h +++ b/blame.h @@ -170,4 +170,6 @@ extern const char *blame_nth_line(struct blame_scoreboard *sb, long lno); extern void init_scoreboard(struct blame_scoreboard *sb); extern void setup_scoreboard(struct blame_scoreboard *sb, const char *path, struct blame_origin **orig); +extern struct blame_entry *blame_entry_prepend(struct blame_entry *head, long start, long end, struct blame_origin *o); + #endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index 7ec5a73..8a858b0 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -890,16 +890,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix) for (range_i = ranges.nr; range_i > 0; --range_i) { const struct range *r = [range_i - 1]; - long bottom = r->start; - long top = r->end; - struct blame_entry *next = ent; - ent = xcalloc(1, sizeof(*ent)); - ent->lno = bottom; - ent->num_lines = top - bottom; - ent->suspect = o; - ent->s_lno = bottom; - ent->next = next; - blame_origin_incref(o); + ent = blame_entry_prepend(ent, r->start, r->end, o); } o->suspects = ent; -- 2.9.3
[RFC PATCH v2 12/22] blame: move no_whole_file_rename flag to scoreboard
The no_whole_file_rename flag is used in parts of blame that are being moved to libgit, and should be accessible via the scoreboard structure. Signed-off-by: Jeff Smith--- blame.h | 1 + builtin/blame.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/blame.h b/blame.h index 42a948d..c140f41 100644 --- a/blame.h +++ b/blame.h @@ -133,6 +133,7 @@ struct blame_scoreboard { int reverse; int show_root; int xdl_opts; + int no_whole_file_rename; }; #endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index d3af02b..bd295eb 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -1285,7 +1285,7 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin, * The first pass looks for unrenamed path to optimize for * common cases, then we look for renames in the second pass. */ - for (pass = 0; pass < 2 - no_whole_file_rename; pass++) { + for (pass = 0; pass < 2 - sb->no_whole_file_rename; pass++) { struct blame_origin *(*find)(struct commit *, struct blame_origin *); find = pass ? find_rename : find_origin; @@ -2763,6 +2763,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix) sb.show_root = show_root; sb.xdl_opts = xdl_opts; + sb.no_whole_file_rename = no_whole_file_rename; read_mailmap(, NULL); -- 2.9.3
[RFC PATCH v2 16/22] blame: rework methods that determine 'final' commit
Either prepare_initial or prepare_final is used to determine which commit is marked as 'final'. Call the underlying methods directly to make this more clear. Signed-off-by: Jeff Smith--- builtin/blame.c | 49 +++-- 1 file changed, 23 insertions(+), 26 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index 94c9ded..3d85f62 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -2297,14 +2297,8 @@ static struct commit *find_single_final(struct rev_info *revs, return found; } -static char *prepare_final(struct blame_scoreboard *sb) -{ - const char *name; - sb->final = find_single_final(sb->revs, ); - return xstrdup_or_null(name); -} - -static const char *dwim_reverse_initial(struct blame_scoreboard *sb) +static struct commit *dwim_reverse_initial(struct rev_info *revs, + const char **name_p) { /* * DWIM "git blame --reverse ONE -- PATH" as @@ -2315,11 +2309,11 @@ static const char *dwim_reverse_initial(struct blame_scoreboard *sb) struct commit *head_commit; unsigned char head_sha1[20]; - if (sb->revs->pending.nr != 1) + if (revs->pending.nr != 1) return NULL; /* Is that sole rev a committish? */ - obj = sb->revs->pending.objects[0].item; + obj = revs->pending.objects[0].item; obj = deref_tag(obj, NULL, 0); if (obj->type != OBJ_COMMIT) return NULL; @@ -2333,17 +2327,19 @@ static const char *dwim_reverse_initial(struct blame_scoreboard *sb) /* Turn "ONE" into "ONE..HEAD" then */ obj->flags |= UNINTERESTING; - add_pending_object(sb->revs, _commit->object, "HEAD"); + add_pending_object(revs, _commit->object, "HEAD"); - sb->final = (struct commit *)obj; - return sb->revs->pending.objects[0].name; + if (name_p) + *name_p = revs->pending.objects[0].name; + return (struct commit *)obj; } -static char *prepare_initial(struct blame_scoreboard *sb) +static struct commit *find_single_initial(struct rev_info *revs, + const char **name_p) { int i; const char *final_commit_name = NULL; - struct rev_info *revs = sb->revs; + struct commit *found = NULL; /* * There must be one and only one negative commit, and it must be @@ -2356,19 +2352,22 @@ static char *prepare_initial(struct blame_scoreboard *sb) obj = deref_tag(obj, NULL, 0); if (obj->type != OBJ_COMMIT) die("Non commit %s?", revs->pending.objects[i].name); - if (sb->final) + if (found) die("More than one commit to dig up from, %s and %s?", revs->pending.objects[i].name, final_commit_name); - sb->final = (struct commit *) obj; + found = (struct commit *) obj; final_commit_name = revs->pending.objects[i].name; } if (!final_commit_name) - final_commit_name = dwim_reverse_initial(sb); + found = dwim_reverse_initial(revs, _commit_name); if (!final_commit_name) die("No commit to dig up from?"); - return xstrdup(final_commit_name); + + if (name_p) + *name_p = final_commit_name; + return found; } static int blame_copy_callback(const struct option *option, const char *arg, int unset) @@ -2412,7 +2411,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix) struct blame_origin *o; struct blame_entry *ent = NULL; long dashdash_pos, lno; - char *final_commit_name = NULL; + const char *final_commit_name = NULL; enum object_type type; struct commit *final_commit = NULL; struct progress_info pi = { NULL, 0 }; @@ -2621,14 +2620,15 @@ int cmd_blame(int argc, const char **argv, const char *prefix) sb.revs = sb.contents_from = contents_from; sb.reverse = reverse; + if (!reverse) { - final_commit_name = prepare_final(); + sb.final = find_single_final(, _commit_name); sb.commits.compare = compare_commits_by_commit_date; } else if (contents_from) die(_("--contents and --reverse do not blend well.")); else { - final_commit_name = prepare_initial(); + sb.final = find_single_initial(, _commit_name); sb.commits.compare = compare_commits_by_reverse_commit_date; if (revs.first_parent_only) revs.children.name = NULL; @@ -2783,10 +2783,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix) if (!incremental) setup_pager(); - - free(final_commit_name);
[RFC PATCH v2 14/22] blame: move progess updates to a scoreboard callback
Allow the interface user to decide how to handle a progress update. Signed-off-by: Jeff Smith--- blame.h | 3 +++ builtin/blame.c | 24 ++-- 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/blame.h b/blame.h index e85db06..a0bd91b 100644 --- a/blame.h +++ b/blame.h @@ -138,6 +138,9 @@ struct blame_scoreboard { /* callbacks */ void(*on_sanity_fail)(struct blame_scoreboard *, int); + void(*found_guilty_entry)(struct blame_entry *, void *); + + void *found_guilty_entry_data; }; #endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index d7b3b5a..a40f743 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -1598,9 +1598,10 @@ static int emit_one_suspect_detail(struct blame_origin *suspect, int repeat) * The blame_entry is found to be guilty for the range. * Show it in incremental output. */ -static void found_guilty_entry(struct blame_entry *ent, - struct progress_info *pi) +static void found_guilty_entry(struct blame_entry *ent, void *data) { + struct progress_info *pi = (struct progress_info *)data; + if (incremental) { struct blame_origin *suspect = ent->suspect; @@ -1623,11 +1624,6 @@ static void assign_blame(struct blame_scoreboard *sb, int opt) { struct rev_info *revs = sb->revs; struct commit *commit = prio_queue_get(>commits); - struct progress_info pi = { NULL, 0 }; - - if (show_progress) - pi.progress = start_progress_delay(_("Blaming lines"), - sb->num_lines, 50, 1); while (commit) { struct blame_entry *ent; @@ -1669,7 +1665,8 @@ static void assign_blame(struct blame_scoreboard *sb, int opt) suspect->guilty = 1; for (;;) { struct blame_entry *next = ent->next; - found_guilty_entry(ent, ); + if (sb->found_guilty_entry) + sb->found_guilty_entry(ent, sb->found_guilty_entry_data); if (next) { ent = next; continue; @@ -1685,8 +1682,6 @@ static void assign_blame(struct blame_scoreboard *sb, int opt) if (sb->debug) /* sanity */ sanity_check_refcnt(sb); } - - stop_progress(); } static const char *format_time(unsigned long time, const char *tz_str, @@ -2419,6 +2414,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix) char *final_commit_name = NULL; enum object_type type; struct commit *final_commit = NULL; + struct progress_info pi = { NULL, 0 }; struct string_list range_list = STRING_LIST_INIT_NODUP; int output_option = 0, opt = 0; @@ -2774,8 +2770,16 @@ int cmd_blame(int argc, const char **argv, const char *prefix) read_mailmap(, NULL); + sb.found_guilty_entry = _guilty_entry; + sb.found_guilty_entry_data = + if (show_progress) + pi.progress = start_progress_delay(_("Blaming lines"), + sb.num_lines, 50, 1); + assign_blame(, opt); + stop_progress(); + if (!incremental) setup_pager(); -- 2.9.3
[RFC PATCH v2 15/22] blame: wrap blame_sort and compare_blame_final
The new method's interface is marginally cleaner than blame_sort, and will avoid the need to expose the compare_blame_final method. Signed-off-by: Jeff Smith--- builtin/blame.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index a40f743..94c9ded 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -249,12 +249,6 @@ static int compare_blame_suspect(const void *p1, const void *p2) return s1->s_lno > s2->s_lno ? 1 : -1; } -static struct blame_entry *blame_sort(struct blame_entry *head, - int (*compare_fn)(const void *, const void *)) -{ - return llist_mergesort (head, get_next_blame, set_next_blame, compare_fn); -} - static int compare_commits_by_reverse_commit_date(const void *a, const void *b, void *c) @@ -262,6 +256,12 @@ static int compare_commits_by_reverse_commit_date(const void *a, return -compare_commits_by_commit_date(a, b, c); } +static void blame_sort_final(struct blame_scoreboard *sb) +{ + sb->ent = llist_mergesort(sb->ent, get_next_blame, set_next_blame, + compare_blame_final); +} + static void sanity_check_refcnt(struct blame_scoreboard *); /* @@ -1244,7 +1244,8 @@ static int num_scapegoats(struct rev_info *revs, struct commit *commit, int reve */ static void distribute_blame(struct blame_scoreboard *sb, struct blame_entry *blamed) { - blamed = blame_sort(blamed, compare_blame_suspect); + blamed = llist_mergesort(blamed, get_next_blame, set_next_blame, +compare_blame_suspect); while (blamed) { struct blame_origin *porigin = blamed->suspect; @@ -2788,7 +2789,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix) if (incremental) return 0; - sb.ent = blame_sort(sb.ent, compare_blame_final); + blame_sort_final(); coalesce(); -- 2.9.3
[RFC PATCH v2 17/22] blame: move origin-related methods to libgit
Signed-off-by: Jeff Smith--- Makefile| 1 + blame.c | 62 + blame.h | 15 +++ builtin/blame.c | 120 4 files changed, 102 insertions(+), 96 deletions(-) create mode 100644 blame.c diff --git a/Makefile b/Makefile index e35542e..2d795ed 100644 --- a/Makefile +++ b/Makefile @@ -718,6 +718,7 @@ LIB_OBJS += argv-array.o LIB_OBJS += attr.o LIB_OBJS += base85.o LIB_OBJS += bisect.o +LIB_OBJS += blame.o LIB_OBJS += blob.o LIB_OBJS += branch.o LIB_OBJS += bulk-checkin.o diff --git a/blame.c b/blame.c new file mode 100644 index 000..4855d6d --- /dev/null +++ b/blame.c @@ -0,0 +1,62 @@ +#include "blame.h" + +void blame_origin_decref(struct blame_origin *o) +{ + if (o && --o->refcnt <= 0) { + struct blame_origin *p, *l = NULL; + if (o->previous) + blame_origin_decref(o->previous); + free(o->file.ptr); + /* Should be present exactly once in commit chain */ + for (p = o->commit->util; p; l = p, p = p->next) { + if (p == o) { + if (l) + l->next = p->next; + else + o->commit->util = p->next; + free(o); + return; + } + } + die("internal error in blame_origin_decref"); + } +} + +/* + * Given a commit and a path in it, create a new origin structure. + * The callers that add blame to the scoreboard should use + * get_origin() to obtain shared, refcounted copy instead of calling + * this function directly. + */ +struct blame_origin *make_origin(struct commit *commit, const char *path) +{ + struct blame_origin *o; + FLEX_ALLOC_STR(o, path, path); + o->commit = commit; + o->refcnt = 1; + o->next = commit->util; + commit->util = o; + return o; +} + +/* + * Locate an existing origin or create a new one. + * This moves the origin to front position in the commit util list. + */ +struct blame_origin *get_origin(struct commit *commit, const char *path) +{ + struct blame_origin *o, *l; + + for (o = commit->util, l = NULL; o; l = o, o = o->next) { + if (!strcmp(o->path, path)) { + /* bump to front */ + if (l) { + l->next = o->next; + o->next = commit->util; + commit->util = o; + } + return blame_origin_incref(o); + } + } + return make_origin(commit, path); +} diff --git a/blame.h b/blame.h index a0bd91b..39f8865 100644 --- a/blame.h +++ b/blame.h @@ -143,4 +143,19 @@ struct blame_scoreboard { void *found_guilty_entry_data; }; +/* + * Origin is refcounted and usually we keep the blob contents to be + * reused. + */ +static inline struct blame_origin *blame_origin_incref(struct blame_origin *o) +{ + if (o) + o->refcnt++; + return o; +} +extern void blame_origin_decref(struct blame_origin *o); + +extern struct blame_origin *make_origin(struct commit *commit, const char *path); +extern struct blame_origin *get_origin(struct commit *commit, const char *path); + #endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index 3d85f62..b60d604 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -122,39 +122,6 @@ static void fill_origin_blob(struct diff_options *opt, *file = o->file; } -/* - * Origin is refcounted and usually we keep the blob contents to be - * reused. - */ -static inline struct blame_origin *origin_incref(struct blame_origin *o) -{ - if (o) - o->refcnt++; - return o; -} - -static void origin_decref(struct blame_origin *o) -{ - if (o && --o->refcnt <= 0) { - struct blame_origin *p, *l = NULL; - if (o->previous) - origin_decref(o->previous); - free(o->file.ptr); - /* Should be present exactly once in commit chain */ - for (p = o->commit->util; p; l = p, p = p->next) { - if (p == o) { - if (l) - l->next = p->next; - else - o->commit->util = p->next; - free(o); - return; - } - } - die("internal error in blame::origin_decref"); - } -} - static void drop_origin_blob(struct blame_origin *o) { if (o->file.ptr) { @@ -278,7 +245,7 @@ static void coalesce(struct blame_scoreboard *sb)
[RFC PATCH v2 20/22] blame: create scoreboard init function in libgit
Signed-off-by: Jeff Smith--- blame.c | 7 +++ blame.h | 2 ++ builtin/blame.c | 4 +--- 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/blame.c b/blame.c index 798e61b..17ebf64 100644 --- a/blame.c +++ b/blame.c @@ -1574,3 +1574,10 @@ void assign_blame(struct blame_scoreboard *sb, int opt) sanity_check_refcnt(sb); } } + +void init_scoreboard(struct blame_scoreboard *sb) +{ + memset(sb, 0, sizeof(struct blame_scoreboard)); + sb->move_score = BLAME_DEFAULT_MOVE_SCORE; + sb->copy_score = BLAME_DEFAULT_COPY_SCORE; +} diff --git a/blame.h b/blame.h index a3ea677..9477b6e 100644 --- a/blame.h +++ b/blame.h @@ -171,4 +171,6 @@ extern unsigned blame_entry_score(struct blame_scoreboard *sb, struct blame_entr extern void assign_blame(struct blame_scoreboard *sb, int opt); extern const char *blame_nth_line(struct blame_scoreboard *sb, long lno); +extern void init_scoreboard(struct blame_scoreboard *sb); + #endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index 207a74d..e01265a 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -1026,10 +1026,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix) revs.disable_stdin = 1; setup_revisions(argc, argv, , NULL); - memset(, 0, sizeof(sb)); - sb.move_score = BLAME_DEFAULT_MOVE_SCORE; - sb.copy_score = BLAME_DEFAULT_COPY_SCORE; + init_scoreboard(); sb.revs = sb.contents_from = contents_from; sb.reverse = reverse; -- 2.9.3
[RFC PATCH v2 13/22] blame: make sanity_check use a callback in scoreboard
Allow the interface user to decide how to handle a failed sanity check, whether that be to output with the current state or to do nothing. Signed-off-by: Jeff Smith--- blame.h | 4 builtin/blame.c | 23 +++ 2 files changed, 19 insertions(+), 8 deletions(-) diff --git a/blame.h b/blame.h index c140f41..e85db06 100644 --- a/blame.h +++ b/blame.h @@ -134,6 +134,10 @@ struct blame_scoreboard { int show_root; int xdl_opts; int no_whole_file_rename; + int debug; + + /* callbacks */ + void(*on_sanity_fail)(struct blame_scoreboard *, int); }; #endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index bd295eb..d7b3b5a 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -285,7 +285,7 @@ static void coalesce(struct blame_scoreboard *sb) } } - if (DEBUG) /* sanity */ + if (sb->debug) /* sanity */ sanity_check_refcnt(sb); } @@ -1682,7 +1682,7 @@ static void assign_blame(struct blame_scoreboard *sb, int opt) } origin_decref(suspect); - if (DEBUG) /* sanity */ + if (sb->debug) /* sanity */ sanity_check_refcnt(sb); } @@ -2021,12 +2021,16 @@ static void sanity_check_refcnt(struct blame_scoreboard *sb) baa = 1; } } - if (baa) { - int opt = 0160; - find_alignment(sb, ); - output(sb, opt); - die("Baa %d!", baa); - } + if (baa) + sb->on_sanity_fail(sb, baa); +} + +static void sanity_check_on_fail(struct blame_scoreboard *sb, int baa) +{ + int opt = OUTPUT_SHOW_SCORE | OUTPUT_SHOW_NUMBER | OUTPUT_SHOW_NAME; + find_alignment(sb, ); + output(sb, opt); + die("Baa %d!", baa); } static unsigned parse_score(const char *arg) @@ -2761,6 +2765,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix) if (blame_copy_score) sb.copy_score = blame_copy_score; + sb.debug = DEBUG; + sb.on_sanity_fail = _check_on_fail; + sb.show_root = show_root; sb.xdl_opts = xdl_opts; sb.no_whole_file_rename = no_whole_file_rename; -- 2.9.3
[RFC PATCH v2 18/22] blame: move fake-commit-related methods to libgit
Signed-off-by: Jeff Smith--- blame.c | 203 +++- blame.h | 4 +- builtin/blame.c | 197 -- 3 files changed, 205 insertions(+), 199 deletions(-) diff --git a/blame.c b/blame.c index 4855d6d..8c025bc 100644 --- a/blame.c +++ b/blame.c @@ -1,3 +1,6 @@ +#include "cache.h" +#include "refs.h" +#include "cache-tree.h" #include "blame.h" void blame_origin_decref(struct blame_origin *o) @@ -28,7 +31,7 @@ void blame_origin_decref(struct blame_origin *o) * get_origin() to obtain shared, refcounted copy instead of calling * this function directly. */ -struct blame_origin *make_origin(struct commit *commit, const char *path) +static struct blame_origin *make_origin(struct commit *commit, const char *path) { struct blame_origin *o; FLEX_ALLOC_STR(o, path, path); @@ -60,3 +63,201 @@ struct blame_origin *get_origin(struct commit *commit, const char *path) } return make_origin(commit, path); } + + + +static void verify_working_tree_path(struct commit *work_tree, const char *path) +{ + struct commit_list *parents; + int pos; + + for (parents = work_tree->parents; parents; parents = parents->next) { + const struct object_id *commit_oid = >item->object.oid; + struct object_id blob_oid; + unsigned mode; + + if (!get_tree_entry(commit_oid->hash, path, blob_oid.hash, ) && + sha1_object_info(blob_oid.hash, NULL) == OBJ_BLOB) + return; + } + + pos = cache_name_pos(path, strlen(path)); + if (pos >= 0) + ; /* path is in the index */ + else if (-1 - pos < active_nr && +!strcmp(active_cache[-1 - pos]->name, path)) + ; /* path is in the index, unmerged */ + else + die("no such path '%s' in HEAD", path); +} + +static struct commit_list **append_parent(struct commit_list **tail, const struct object_id *oid) +{ + struct commit *parent; + + parent = lookup_commit_reference(oid->hash); + if (!parent) + die("no such commit %s", oid_to_hex(oid)); + return _list_insert(parent, tail)->next; +} + +static void append_merge_parents(struct commit_list **tail) +{ + int merge_head; + struct strbuf line = STRBUF_INIT; + + merge_head = open(git_path_merge_head(), O_RDONLY); + if (merge_head < 0) { + if (errno == ENOENT) + return; + die("cannot open '%s' for reading", git_path_merge_head()); + } + + while (!strbuf_getwholeline_fd(, merge_head, '\n')) { + struct object_id oid; + if (line.len < GIT_SHA1_HEXSZ || get_oid_hex(line.buf, )) + die("unknown line in '%s': %s", git_path_merge_head(), line.buf); + tail = append_parent(tail, ); + } + close(merge_head); + strbuf_release(); +} + +/* + * This isn't as simple as passing sb->buf and sb->len, because we + * want to transfer ownership of the buffer to the commit (so we + * must use detach). + */ +static void set_commit_buffer_from_strbuf(struct commit *c, struct strbuf *sb) +{ + size_t len; + void *buf = strbuf_detach(sb, ); + set_commit_buffer(c, buf, len); +} + +/* + * Prepare a dummy commit that represents the work tree (or staged) item. + * Note that annotating work tree item never works in the reverse. + */ +struct commit *fake_working_tree_commit(struct diff_options *opt, + const char *path, + const char *contents_from) +{ + struct commit *commit; + struct blame_origin *origin; + struct commit_list **parent_tail, *parent; + struct object_id head_oid; + struct strbuf buf = STRBUF_INIT; + const char *ident; + time_t now; + int size, len; + struct cache_entry *ce; + unsigned mode; + struct strbuf msg = STRBUF_INIT; + + read_cache(); + time(); + commit = alloc_commit_node(); + commit->object.parsed = 1; + commit->date = now; + parent_tail = >parents; + + if (!resolve_ref_unsafe("HEAD", RESOLVE_REF_READING, head_oid.hash, NULL)) + die("no such ref: HEAD"); + + parent_tail = append_parent(parent_tail, _oid); + append_merge_parents(parent_tail); + verify_working_tree_path(commit, path); + + origin = make_origin(commit, path); + + ident = fmt_ident("Not Committed Yet", "not.committed.yet", NULL, 0); + strbuf_addstr(, "tree \n"); + for (parent = commit->parents; parent; parent = parent->next) + strbuf_addf(, "parent %s\n", + oid_to_hex(>item->object.oid)); + strbuf_addf(, +
[RFC PATCH v2 04/22] blame: move origin and entry structures to header
The origin and blame_entry structures are core to the blame interface and reference each other. Since origin will be more exposed, rename it to blame_origin to clarify what it is a part of. Signed-off-by: Jeff Smith--- blame.h | 86 ++ builtin/blame.c | 185 +--- 2 files changed, 140 insertions(+), 131 deletions(-) create mode 100644 blame.h diff --git a/blame.h b/blame.h new file mode 100644 index 000..f52d0fc --- /dev/null +++ b/blame.h @@ -0,0 +1,86 @@ +#ifndef BLAME_H +#define BLAME_H + +#include "cache.h" +#include "commit.h" +#include "xdiff-interface.h" + +/* + * One blob in a commit that is being suspected + */ +struct blame_origin { + int refcnt; + /* Record preceding blame record for this blob */ + struct blame_origin *previous; + /* origins are put in a list linked via `next' hanging off the +* corresponding commit's util field in order to make finding +* them fast. The presence in this chain does not count +* towards the origin's reference count. It is tempting to +* let it count as long as the commit is pending examination, +* but even under circumstances where the commit will be +* present multiple times in the priority queue of unexamined +* commits, processing the first instance will not leave any +* work requiring the origin data for the second instance. An +* interspersed commit changing that would have to be +* preexisting with a different ancestry and with the same +* commit date in order to wedge itself between two instances +* of the same commit in the priority queue _and_ produce +* blame entries relevant for it. While we don't want to let +* us get tripped up by this case, it certainly does not seem +* worth optimizing for. +*/ + struct blame_origin *next; + struct commit *commit; + /* `suspects' contains blame entries that may be attributed to +* this origin's commit or to parent commits. When a commit +* is being processed, all suspects will be moved, either by +* assigning them to an origin in a different commit, or by +* shipping them to the scoreboard's ent list because they +* cannot be attributed to a different commit. +*/ + struct blame_entry *suspects; + mmfile_t file; + struct object_id blob_oid; + unsigned mode; + /* guilty gets set when shipping any suspects to the final +* blame list instead of other commits +*/ + char guilty; + char path[FLEX_ARRAY]; +}; + +/* + * Each group of lines is described by a blame_entry; it can be split + * as we pass blame to the parents. They are arranged in linked lists + * kept as `suspects' of some unprocessed origin, or entered (when the + * blame origin has been finalized) into the scoreboard structure. + * While the scoreboard structure is only sorted at the end of + * processing (according to final image line number), the lists + * attached to an origin are sorted by the target line number. + */ +struct blame_entry { + struct blame_entry *next; + + /* the first line of this group in the final image; +* internally all line numbers are 0 based. +*/ + int lno; + + /* how many lines this group has */ + int num_lines; + + /* the commit that introduced this group into the final image */ + struct blame_origin *suspect; + + /* the line number of the first line of this group in the +* suspect's file; internally all line numbers are 0 based. +*/ + int s_lno; + + /* how significant this entry is -- cached to avoid +* scanning the lines over and over. +*/ + unsigned score; +}; + +#endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index e30b3ef..caa5ee0 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -28,6 +28,7 @@ #include "line-log.h" #include "dir.h" #include "progress.h" +#include "blame.h" static char blame_usage[] = N_("git blame [] [] [] [--] "); @@ -84,50 +85,6 @@ static unsigned blame_copy_score; #define METAINFO_SHOWN (1u<<12) #define MORE_THAN_ONE_PATH (1u<<13) -/* - * One blob in a commit that is being suspected - */ -struct origin { - int refcnt; - /* Record preceding blame record for this blob */ - struct origin *previous; - /* origins are put in a list linked via `next' hanging off the -* corresponding commit's util field in order to make finding -* them fast. The presence in this chain does not count -* towards the origin's reference count. It is tempting to -* let it count as long as the commit is pending examination, -* but even under circumstances where the commit will be -* present multiple times in the
[RFC PATCH v2 08/22] blame: move contents_from to scoreboard
The argument from --contents is used in parts of blame that are being moved to libgit, and should be accessible via the scoreboard structure. Signed-off-by: Jeff Smith--- blame.h | 3 +++ builtin/blame.c | 1 + 2 files changed, 4 insertions(+) diff --git a/blame.h b/blame.h index fde7d1d..388309d 100644 --- a/blame.h +++ b/blame.h @@ -125,6 +125,9 @@ struct blame_scoreboard { */ unsigned move_score; unsigned copy_score; + + /* use this file's contents as the final image */ + const char *contents_from; }; #endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index 949e179..032fd15 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -2617,6 +2617,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix) sb.copy_score = BLAME_DEFAULT_COPY_SCORE; sb.revs = + sb.contents_from = contents_from; if (!reverse) { final_commit_name = prepare_final(); sb.commits.compare = compare_commits_by_commit_date; -- 2.9.3
[RFC PATCH v2 07/22] blame: move copy/move thresholds to scoreboard
Copy and move score thresholds are used in parts of blame that are being moved to libgit, and should be accessible via the scoreboard structure. Signed-off-by: Jeff Smith--- blame.h | 10 ++ builtin/blame.c | 36 2 files changed, 26 insertions(+), 20 deletions(-) diff --git a/blame.h b/blame.h index 3438052..fde7d1d 100644 --- a/blame.h +++ b/blame.h @@ -7,6 +7,9 @@ #include "revision.h" #include "prio-queue.h" +#define BLAME_DEFAULT_MOVE_SCORE 20 +#define BLAME_DEFAULT_COPY_SCORE 40 + /* * One blob in a commit that is being suspected */ @@ -115,6 +118,13 @@ struct blame_scoreboard { int num_read_blob; int num_get_patch; int num_commits; + + /* +* blame for a blame_entry with score lower than these thresholds +* is not passed to the parent using move/copy logic. +*/ + unsigned move_score; + unsigned copy_score; }; #endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index 12add06..949e179 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -67,14 +67,8 @@ static struct string_list mailmap = STRING_LIST_INIT_NODUP; #define PICKAXE_BLAME_COPY_HARDER 04 #define PICKAXE_BLAME_COPY_HARDEST 010 -/* - * blame for a blame_entry with score lower than these thresholds - * is not passed to the parent using move/copy logic. - */ static unsigned blame_move_score; static unsigned blame_copy_score; -#define BLAME_DEFAULT_MOVE_SCORE 20 -#define BLAME_DEFAULT_COPY_SCORE 40 /* Remember to update object flag allocation in object.h */ #define METAINFO_SHOWN (1u<<12) @@ -1047,7 +1041,7 @@ static void find_move_in_parent(struct blame_scoreboard *sb, next = e->next; find_copy_in_blob(sb, e, parent, split, _p); if (split[1].suspect && - blame_move_score < ent_score(sb, [1])) { + sb->move_score < ent_score(sb, [1])) { split_blame(blamed, , split, e); } else { e->next = leftover; @@ -1056,7 +1050,7 @@ static void find_move_in_parent(struct blame_scoreboard *sb, decref_split(split); } *unblamedtail = NULL; - toosmall = filter_small(sb, toosmall, , blame_move_score); + toosmall = filter_small(sb, toosmall, , sb->move_score); } while (unblamed); target->suspects = reverse_blame(leftover, NULL); } @@ -1177,7 +1171,7 @@ static void find_copy_in_parent(struct blame_scoreboard *sb, for (j = 0; j < num_ents; j++) { struct blame_entry *split = blame_list[j].split; if (split[1].suspect && - blame_copy_score < ent_score(sb, [1])) { + sb->copy_score < ent_score(sb, [1])) { split_blame(blamed, , split, blame_list[j].ent); } else { @@ -1188,7 +1182,7 @@ static void find_copy_in_parent(struct blame_scoreboard *sb, } free(blame_list); *unblamedtail = NULL; - toosmall = filter_small(sb, toosmall, , blame_copy_score); + toosmall = filter_small(sb, toosmall, , sb->copy_score); } while (unblamed); target->suspects = reverse_blame(leftover, NULL); diff_flush(_opts); @@ -1345,7 +1339,7 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin, * Optionally find moves in parents' files. */ if (opt & PICKAXE_BLAME_MOVE) { - filter_small(sb, , >suspects, blame_move_score); + filter_small(sb, , >suspects, sb->move_score); if (origin->suspects) { for (i = 0, sg = first_scapegoat(revs, commit); i < num_sg && sg; @@ -1364,12 +1358,12 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin, * Optionally find copies from parents' files. */ if (opt & PICKAXE_BLAME_COPY) { - if (blame_copy_score > blame_move_score) - filter_small(sb, , >suspects, blame_copy_score); - else if (blame_copy_score < blame_move_score) { + if (sb->copy_score > sb->move_score) + filter_small(sb, , >suspects, sb->copy_score); + else if (sb->copy_score < sb->move_score) { origin->suspects = blame_merge(origin->suspects, toosmall); toosmall = NULL; - filter_small(sb, , >suspects, blame_copy_score); + filter_small(sb, , >suspects, sb->copy_score);
[RFC PATCH v2 11/22] blame: move xdl_opts flags to scoreboard
The xdl_opts flags are used in parts of blame that are being moved to libgit, and should be accessible via the scoreboard structure. Signed-off-by: Jeff Smith--- blame.h | 1 + builtin/blame.c | 7 --- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/blame.h b/blame.h index 7691256..42a948d 100644 --- a/blame.h +++ b/blame.h @@ -132,6 +132,7 @@ struct blame_scoreboard { /* flags */ int reverse; int show_root; + int xdl_opts; }; #endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index 77fb71f..d3af02b 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -80,7 +80,7 @@ struct progress_info { }; static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b, - xdl_emit_hunk_consume_func_t hunk_func, void *cb_data) + xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts) { xpparam_t xpp = {0}; xdemitconf_t xecfg = {0}; @@ -823,7 +823,7 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb, fill_origin_blob(>revs->diffopt, target, _o, >num_read_blob); sb->num_get_patch++; - if (diff_hunks(_p, _o, blame_chunk_cb, )) + if (diff_hunks(_p, _o, blame_chunk_cb, , sb->xdl_opts)) die("unable to generate diff (%s -> %s)", oid_to_hex(>commit->object.oid), oid_to_hex(>commit->object.oid)); @@ -972,7 +972,7 @@ static void find_copy_in_blob(struct blame_scoreboard *sb, * file_p partially may match that image. */ memset(split, 0, sizeof(struct blame_entry [3])); - if (diff_hunks(file_p, _o, handle_split_cb, )) + if (diff_hunks(file_p, _o, handle_split_cb, , sb->xdl_opts)) die("unable to generate diff (%s)", oid_to_hex(>commit->object.oid)); /* remainder, if any, all match the preimage */ @@ -2762,6 +2762,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix) sb.copy_score = blame_copy_score; sb.show_root = show_root; + sb.xdl_opts = xdl_opts; read_mailmap(, NULL); -- 2.9.3
[RFC PATCH v2 05/22] blame: move scoreboard structure to header
The scoreboard structure is core to the blame interface. Since scoreboard will be more exposed, rename it to blame_scoreboard to clarify what it is a part of. Signed-off-by: Jeff Smith--- blame.h | 29 builtin/blame.c | 83 +++-- 2 files changed, 57 insertions(+), 55 deletions(-) diff --git a/blame.h b/blame.h index f52d0fc..8d6a6fd 100644 --- a/blame.h +++ b/blame.h @@ -4,6 +4,8 @@ #include "cache.h" #include "commit.h" #include "xdiff-interface.h" +#include "revision.h" +#include "prio-queue.h" /* * One blob in a commit that is being suspected @@ -83,4 +85,31 @@ struct blame_entry { unsigned score; }; +/* + * The current state of the blame assignment. + */ +struct blame_scoreboard { + /* the final commit (i.e. where we started digging from) */ + struct commit *final; + /* Priority queue for commits with unassigned blame records */ + struct prio_queue commits; + struct rev_info *revs; + const char *path; + + /* +* The contents in the final image. +* Used by many functions to obtain contents of the nth line, +* indexed with scoreboard.lineno[blame_entry.lno]. +*/ + const char *final_buf; + unsigned long final_buf_size; + + /* linked list of blames */ + struct blame_entry *ent; + + /* look-up a line in the final buffer */ + int num_lines; + int *lineno; +}; + #endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index caa5ee0..4a1da53 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -273,41 +273,14 @@ static int compare_commits_by_reverse_commit_date(const void *a, return -compare_commits_by_commit_date(a, b, c); } -/* - * The current state of the blame assignment. - */ -struct scoreboard { - /* the final commit (i.e. where we started digging from) */ - struct commit *final; - /* Priority queue for commits with unassigned blame records */ - struct prio_queue commits; - struct rev_info *revs; - const char *path; - - /* -* The contents in the final image. -* Used by many functions to obtain contents of the nth line, -* indexed with scoreboard.lineno[blame_entry.lno]. -*/ - const char *final_buf; - unsigned long final_buf_size; - - /* linked list of blames */ - struct blame_entry *ent; - - /* look-up a line in the final buffer */ - int num_lines; - int *lineno; -}; - -static void sanity_check_refcnt(struct scoreboard *); +static void sanity_check_refcnt(struct blame_scoreboard *); /* * If two blame entries that are next to each other came from * contiguous lines in the same origin (i.e. pair), * merge them together. */ -static void coalesce(struct scoreboard *sb) +static void coalesce(struct blame_scoreboard *sb) { struct blame_entry *ent, *next; @@ -333,7 +306,7 @@ static void coalesce(struct scoreboard *sb) * the commit priority queue of the score board. */ -static void queue_blames(struct scoreboard *sb, struct blame_origin *porigin, +static void queue_blames(struct blame_scoreboard *sb, struct blame_origin *porigin, struct blame_entry *sorted) { if (porigin->suspects) @@ -576,14 +549,14 @@ static void dup_entry(struct blame_entry ***queue, *queue = >next; } -static const char *nth_line(struct scoreboard *sb, long lno) +static const char *nth_line(struct blame_scoreboard *sb, long lno) { return sb->final_buf + sb->lineno[lno]; } static const char *nth_line_cb(void *data, long lno) { - return nth_line((struct scoreboard *)data, lno); + return nth_line((struct blame_scoreboard *)data, lno); } /* @@ -842,7 +815,7 @@ static int blame_chunk_cb(long start_a, long count_a, * for the lines it is suspected to its parent. Run diff to find * which lines came from parent and pass blame for them. */ -static void pass_blame_to_parent(struct scoreboard *sb, +static void pass_blame_to_parent(struct blame_scoreboard *sb, struct blame_origin *target, struct blame_origin *parent) { @@ -882,7 +855,7 @@ static void pass_blame_to_parent(struct scoreboard *sb, * * Compute how trivial the lines in the blame_entry are. */ -static unsigned ent_score(struct scoreboard *sb, struct blame_entry *e) +static unsigned ent_score(struct blame_scoreboard *sb, struct blame_entry *e) { unsigned score; const char *cp, *ep; @@ -909,7 +882,7 @@ static unsigned ent_score(struct scoreboard *sb, struct blame_entry *e) * so far, by comparing this and best_so_far and copying this into * bst_so_far as needed. */ -static void copy_split_if_better(struct scoreboard *sb, +static void copy_split_if_better(struct blame_scoreboard *sb,
[RFC PATCH v2 10/22] blame: move show_root flag to scoreboard
The show_root flag is used in parts of blame that are being moved to libgit, and should be accessible via the scoreboard structure. Signed-off-by: Jeff Smith--- blame.h | 1 + builtin/blame.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/blame.h b/blame.h index 85703dd..7691256 100644 --- a/blame.h +++ b/blame.h @@ -131,6 +131,7 @@ struct blame_scoreboard { /* flags */ int reverse; + int show_root; }; #endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index ed50eda..77fb71f 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -1660,7 +1660,7 @@ static void assign_blame(struct blame_scoreboard *sb, int opt) mark_parents_uninteresting(commit); } /* treat root commit as boundary */ - if (!commit->parents && !show_root) + if (!commit->parents && !sb->show_root) commit->object.flags |= UNINTERESTING; /* Take responsibility for the remaining entries */ @@ -2761,6 +2761,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix) if (blame_copy_score) sb.copy_score = blame_copy_score; + sb.show_root = show_root; + read_mailmap(, NULL); assign_blame(, opt); -- 2.9.3
[RFC PATCH v2 19/22] blame: move scoreboard-related methods to libgit
Signed-off-by: Jeff Smith--- blame.c | 1313 ++ blame.h | 11 + builtin/blame.c | 1330 +-- 3 files changed, 1330 insertions(+), 1324 deletions(-) diff --git a/blame.c b/blame.c index 8c025bc..798e61b 100644 --- a/blame.c +++ b/blame.c @@ -1,6 +1,9 @@ #include "cache.h" #include "refs.h" #include "cache-tree.h" +#include "mergesort.h" +#include "diff.h" +#include "diffcore.h" #include "blame.h" void blame_origin_decref(struct blame_origin *o) @@ -261,3 +264,1313 @@ struct commit *fake_working_tree_commit(struct diff_options *opt, return commit; } + + + +static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b, + xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts) +{ + xpparam_t xpp = {0}; + xdemitconf_t xecfg = {0}; + xdemitcb_t ecb = {NULL}; + + xpp.flags = xdl_opts; + xecfg.hunk_func = hunk_func; + ecb.priv = cb_data; + return xdi_diff(file_a, file_b, , , ); +} + +/* + * Given an origin, prepare mmfile_t structure to be used by the + * diff machinery + */ +static void fill_origin_blob(struct diff_options *opt, +struct blame_origin *o, mmfile_t *file, int *num_read_blob) +{ + if (!o->file.ptr) { + enum object_type type; + unsigned long file_size; + + (*num_read_blob)++; + if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) && + textconv_object(o->path, o->mode, >blob_oid, 1, >ptr, _size)) + ; + else + file->ptr = read_sha1_file(o->blob_oid.hash, , + _size); + file->size = file_size; + + if (!file->ptr) + die("Cannot read blob %s for path %s", + oid_to_hex(>blob_oid), + o->path); + o->file = *file; + } + else + *file = o->file; +} + +static void drop_origin_blob(struct blame_origin *o) +{ + if (o->file.ptr) { + free(o->file.ptr); + o->file.ptr = NULL; + } +} + +/* + * Any merge of blames happens on lists of blames that arrived via + * different parents in a single suspect. In this case, we want to + * sort according to the suspect line numbers as opposed to the final + * image line numbers. The function body is somewhat longish because + * it avoids unnecessary writes. + */ + +static struct blame_entry *blame_merge(struct blame_entry *list1, + struct blame_entry *list2) +{ + struct blame_entry *p1 = list1, *p2 = list2, + **tail = + + if (!p1) + return p2; + if (!p2) + return p1; + + if (p1->s_lno <= p2->s_lno) { + do { + tail = >next; + if ((p1 = *tail) == NULL) { + *tail = p2; + return list1; + } + } while (p1->s_lno <= p2->s_lno); + } + for (;;) { + *tail = p2; + do { + tail = >next; + if ((p2 = *tail) == NULL) { + *tail = p1; + return list1; + } + } while (p1->s_lno > p2->s_lno); + *tail = p1; + do { + tail = >next; + if ((p1 = *tail) == NULL) { + *tail = p2; + return list1; + } + } while (p1->s_lno <= p2->s_lno); + } +} + +static void *get_next_blame(const void *p) +{ + return ((struct blame_entry *)p)->next; +} + +static void set_next_blame(void *p1, void *p2) +{ + ((struct blame_entry *)p1)->next = p2; +} + +/* + * Final image line numbers are all different, so we don't need a + * three-way comparison here. + */ + +static int compare_blame_final(const void *p1, const void *p2) +{ + return ((struct blame_entry *)p1)->lno > ((struct blame_entry *)p2)->lno + ? 1 : -1; +} + +static int compare_blame_suspect(const void *p1, const void *p2) +{ + const struct blame_entry *s1 = p1, *s2 = p2; + /* +* to allow for collating suspects, we sort according to the +* respective pointer value as the primary sorting criterion. +* The actual relation is pretty unimportant as long as it +* establishes a total order. Comparing as integers gives us +* that. +*/ + if (s1->suspect != s2->suspect) + return (intptr_t)s1->suspect > (intptr_t)s2->suspect ? 1 : -1; + if (s1->s_lno ==
[RFC PATCH v2 02/22] blame: move textconv_object with related functions
textconv_object is used in places other than blame.c and should be moved to a more appropriate location. Other textconv related functions are located in diff.c so that seems as good a place as any. Signed-off-by: Jeff Smith--- builtin.h | 2 -- builtin/blame.c| 28 builtin/cat-file.c | 1 + diff.c | 23 +++ diff.h | 7 +++ 5 files changed, 31 insertions(+), 30 deletions(-) diff --git a/builtin.h b/builtin.h index 9e4a898..498ac80 100644 --- a/builtin.h +++ b/builtin.h @@ -25,8 +25,6 @@ struct fmt_merge_msg_opts { extern int fmt_merge_msg(struct strbuf *in, struct strbuf *out, struct fmt_merge_msg_opts *); -extern int textconv_object(const char *path, unsigned mode, const struct object_id *oid, int oid_valid, char **buf, unsigned long *buf_size); - extern int is_builtin(const char *s); extern int cmd_add(int argc, const char **argv, const char *prefix); diff --git a/builtin/blame.c b/builtin/blame.c index 42c56eb..c419981 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -147,34 +147,6 @@ static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b, } /* - * Prepare diff_filespec and convert it using diff textconv API - * if the textconv driver exists. - * Return 1 if the conversion succeeds, 0 otherwise. - */ -int textconv_object(const char *path, - unsigned mode, - const struct object_id *oid, - int oid_valid, - char **buf, - unsigned long *buf_size) -{ - struct diff_filespec *df; - struct userdiff_driver *textconv; - - df = alloc_filespec(path); - fill_filespec(df, oid->hash, oid_valid, mode); - textconv = get_textconv(df); - if (!textconv) { - free_filespec(df); - return 0; - } - - *buf_size = fill_textconv(textconv, df, buf); - free_filespec(df); - return 1; -} - -/* * Given an origin, prepare mmfile_t structure to be used by the * diff machinery */ diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 1890d7a..79a2c82 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -5,6 +5,7 @@ */ #include "cache.h" #include "builtin.h" +#include "diff.h" #include "parse-options.h" #include "userdiff.h" #include "streaming.h" diff --git a/diff.c b/diff.c index 74283d9..040fb25 100644 --- a/diff.c +++ b/diff.c @@ -5270,6 +5270,29 @@ size_t fill_textconv(struct userdiff_driver *driver, return size; } +int textconv_object(const char *path, + unsigned mode, + const struct object_id *oid, + int oid_valid, + char **buf, + unsigned long *buf_size) +{ + struct diff_filespec *df; + struct userdiff_driver *textconv; + + df = alloc_filespec(path); + fill_filespec(df, oid->hash, oid_valid, mode); + textconv = get_textconv(df); + if (!textconv) { + free_filespec(df); + return 0; + } + + *buf_size = fill_textconv(textconv, df, buf); + free_filespec(df); + return 1; +} + void setup_diff_pager(struct diff_options *opt) { /* diff --git a/diff.h b/diff.h index 5be1ee7..52ebd54 100644 --- a/diff.h +++ b/diff.h @@ -385,6 +385,13 @@ extern size_t fill_textconv(struct userdiff_driver *driver, */ extern struct userdiff_driver *get_textconv(struct diff_filespec *one); +/* + * Prepare diff_filespec and convert it using diff textconv API + * if the textconv driver exists. + * Return 1 if the conversion succeeds, 0 otherwise. + */ +extern int textconv_object(const char *path, unsigned mode, const struct object_id *oid, int oid_valid, char **buf, unsigned long *buf_size); + extern int parse_rename_score(const char **cp_p); extern long parse_algorithm_value(const char *value); -- 2.9.3
[RFC PATCH v2 06/22] blame: move stat counters to scoreboard
Statistic counters are used in parts of blame that are being moved to libgit, and should be accessible via the scoreboard structure. Signed-off-by: Jeff Smith--- blame.h | 5 + builtin/blame.c | 29 - 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/blame.h b/blame.h index 8d6a6fd..3438052 100644 --- a/blame.h +++ b/blame.h @@ -110,6 +110,11 @@ struct blame_scoreboard { /* look-up a line in the final buffer */ int num_lines; int *lineno; + + /* stats */ + int num_read_blob; + int num_get_patch; + int num_commits; }; #endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index 4a1da53..12add06 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -62,11 +62,6 @@ static struct string_list mailmap = STRING_LIST_INIT_NODUP; #define DEBUG 0 #endif -/* stats */ -static int num_read_blob; -static int num_get_patch; -static int num_commits; - #define PICKAXE_BLAME_MOVE 01 #define PICKAXE_BLAME_COPY 02 #define PICKAXE_BLAME_COPY_HARDER 04 @@ -108,13 +103,13 @@ static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b, * diff machinery */ static void fill_origin_blob(struct diff_options *opt, -struct blame_origin *o, mmfile_t *file) +struct blame_origin *o, mmfile_t *file, int *num_read_blob) { if (!o->file.ptr) { enum object_type type; unsigned long file_size; - num_read_blob++; + (*num_read_blob)++; if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) && textconv_object(o->path, o->mode, >blob_oid, 1, >ptr, _size)) ; @@ -830,9 +825,9 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb, d.offset = 0; d.dstq = d.srcq = >suspects; - fill_origin_blob(>revs->diffopt, parent, _p); - fill_origin_blob(>revs->diffopt, target, _o); - num_get_patch++; + fill_origin_blob(>revs->diffopt, parent, _p, >num_read_blob); + fill_origin_blob(>revs->diffopt, target, _o, >num_read_blob); + sb->num_get_patch++; if (diff_hunks(_p, _o, blame_chunk_cb, )) die("unable to generate diff (%s -> %s)", @@ -1036,7 +1031,7 @@ static void find_move_in_parent(struct blame_scoreboard *sb, if (!unblamed) return; /* nothing remains for this target */ - fill_origin_blob(>revs->diffopt, parent, _p); + fill_origin_blob(>revs->diffopt, parent, _p, >num_read_blob); if (!file_p.ptr) return; @@ -1165,7 +1160,7 @@ static void find_copy_in_parent(struct blame_scoreboard *sb, norigin = get_origin(parent, p->one->path); oidcpy(>blob_oid, >one->oid); norigin->mode = p->one->mode; - fill_origin_blob(>revs->diffopt, norigin, _p); + fill_origin_blob(>revs->diffopt, norigin, _p, >num_read_blob); if (!file_p.ptr) continue; @@ -1330,7 +1325,7 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin, } } - num_commits++; + sb->num_commits++; for (i = 0, sg = first_scapegoat(revs, commit); i < num_sg && sg; sg = sg->next, i++) { @@ -2714,7 +2709,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix) oid_to_hex(>blob_oid), path); } - num_read_blob++; + sb.num_read_blob++; lno = prepare_lines(); if (lno && !range_list.nr) @@ -2795,9 +2790,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix) } if (show_stats) { - printf("num read blob: %d\n", num_read_blob); - printf("num get patch: %d\n", num_get_patch); - printf("num commits: %d\n", num_commits); + printf("num read blob: %d\n", sb.num_read_blob); + printf("num get patch: %d\n", sb.num_get_patch); + printf("num commits: %d\n", sb.num_commits); } return 0; } -- 2.9.3
[RFC PATCH v2 09/22] blame: move reverse flag to scoreboard
The reverse flag is used in parts of blame that are being moved to libgit, and should be accessible via the scoreboard structure. Signed-off-by: Jeff Smith--- blame.h | 3 +++ builtin/blame.c | 20 +++- 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/blame.h b/blame.h index 388309d..85703dd 100644 --- a/blame.h +++ b/blame.h @@ -128,6 +128,9 @@ struct blame_scoreboard { /* use this file's contents as the final image */ const char *contents_from; + + /* flags */ + int reverse; }; #endif /* BLAME_H */ diff --git a/builtin/blame.c b/builtin/blame.c index 032fd15..ed50eda 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -1218,7 +1218,8 @@ static void pass_whole_blame(struct blame_scoreboard *sb, * "parent" (and "porigin"), but what we mean is to find scapegoat to * exonerate ourselves. */ -static struct commit_list *first_scapegoat(struct rev_info *revs, struct commit *commit) +static struct commit_list *first_scapegoat(struct rev_info *revs, struct commit *commit, + int reverse) { if (!reverse) { if (revs->first_parent_only && @@ -1232,9 +1233,9 @@ static struct commit_list *first_scapegoat(struct rev_info *revs, struct commit return lookup_decoration(>children, >object); } -static int num_scapegoats(struct rev_info *revs, struct commit *commit) +static int num_scapegoats(struct rev_info *revs, struct commit *commit, int reverse) { - struct commit_list *l = first_scapegoat(revs, commit); + struct commit_list *l = first_scapegoat(revs, commit, reverse); return commit_list_count(l); } @@ -1272,7 +1273,7 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin, struct blame_entry *toosmall = NULL; struct blame_entry *blames, **blametail = - num_sg = num_scapegoats(revs, commit); + num_sg = num_scapegoats(revs, commit, sb->reverse); if (!num_sg) goto finish; else if (num_sg < ARRAY_SIZE(sg_buf)) @@ -1288,7 +1289,7 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin, struct blame_origin *(*find)(struct commit *, struct blame_origin *); find = pass ? find_rename : find_origin; - for (i = 0, sg = first_scapegoat(revs, commit); + for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse); i < num_sg && sg; sg = sg->next, i++) { struct commit *p = sg->item; @@ -1320,7 +1321,7 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin, } sb->num_commits++; - for (i = 0, sg = first_scapegoat(revs, commit); + for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse); i < num_sg && sg; sg = sg->next, i++) { struct blame_origin *porigin = sg_origin[i]; @@ -1341,7 +1342,7 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin, if (opt & PICKAXE_BLAME_MOVE) { filter_small(sb, , >suspects, sb->move_score); if (origin->suspects) { - for (i = 0, sg = first_scapegoat(revs, commit); + for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse); i < num_sg && sg; sg = sg->next, i++) { struct blame_origin *porigin = sg_origin[i]; @@ -1368,7 +1369,7 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin, if (!origin->suspects) goto finish; - for (i = 0, sg = first_scapegoat(revs, commit); + for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse); i < num_sg && sg; sg = sg->next, i++) { struct blame_origin *porigin = sg_origin[i]; @@ -1649,7 +1650,7 @@ static void assign_blame(struct blame_scoreboard *sb, int opt) */ origin_incref(suspect); parse_commit(commit); - if (reverse || + if (sb->reverse || (!(commit->object.flags & UNINTERESTING) && !(revs->max_age != -1 && commit->date < revs->max_age))) pass_blame(sb, suspect, opt); @@ -2618,6 +2619,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix) sb.revs = sb.contents_from = contents_from; + sb.reverse = reverse; if (!reverse) { final_commit_name = prepare_final(); sb.commits.compare = compare_commits_by_commit_date; -- 2.9.3
[RFC PATCH v2 00/22] Add blame to libgit
Rather than duplicate large portions of builtin/blame.c in cgit, it would be better to shift its core functionality into libgit.a. The functionality left in builtin/blame.c mostly relates to terminal presentation. Since initial patchset: Made commit titles consistent Broke some commits into more atomic pieces Fleshed out commit message bodies Made public structure and method names more clearly blame-related Jeff Smith (22): blame: remove unneeded dependency on blob.h blame: move textconv_object with related functions blame: remove unused parameters blame: move origin and entry structures to header blame: move scoreboard structure to header blame: move stat counters to scoreboard blame: move copy/move thresholds to scoreboard blame: move contents_from to scoreboard blame: move reverse flag to scoreboard blame: move show_root flag to scoreboard blame: move xdl_opts flags to scoreboard blame: move no_whole_file_rename flag to scoreboard blame: make sanity_check use a callback in scoreboard blame: move progess updates to a scoreboard callback blame: wrap blame_sort and compare_blame_final blame: rework methods that determine 'final' commit blame: move origin-related methods to libgit blame: move fake-commit-related methods to libgit blame: move scoreboard-related methods to libgit blame: create scoreboard init function in libgit blame: create scoreboard setup function in libgit blame: create entry prepend function in libgit Makefile |1 + blame.c| 1863 + blame.h| 175 + builtin.h |2 - builtin/blame.c| 2130 ++-- builtin/cat-file.c |1 + diff.c | 23 + diff.h |7 + 8 files changed, 2143 insertions(+), 2059 deletions(-) create mode 100644 blame.c create mode 100644 blame.h -- 2.9.3
[RFC PATCH v2 03/22] blame: remove unused parameters
Clean up blame code before moving it into libgit Signed-off-by: Jeff Smith--- builtin/blame.c | 25 ++--- 1 file changed, 10 insertions(+), 15 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index c419981..e30b3ef 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -449,9 +449,7 @@ static struct origin *make_origin(struct commit *commit, const char *path) * Locate an existing origin or create a new one. * This moves the origin to front position in the commit util list. */ -static struct origin *get_origin(struct scoreboard *sb, -struct commit *commit, -const char *path) +static struct origin *get_origin(struct commit *commit, const char *path) { struct origin *o, *l; @@ -499,8 +497,7 @@ static int fill_blob_sha1_and_mode(struct origin *origin) * We have an origin -- check if the same path exists in the * parent and return an origin structure to represent it. */ -static struct origin *find_origin(struct scoreboard *sb, - struct commit *parent, +static struct origin *find_origin(struct commit *parent, struct origin *origin) { struct origin *porigin; @@ -543,7 +540,7 @@ static struct origin *find_origin(struct scoreboard *sb, if (!diff_queued_diff.nr) { /* The path is the same as parent */ - porigin = get_origin(sb, parent, origin->path); + porigin = get_origin(parent, origin->path); oidcpy(>blob_oid, >blob_oid); porigin->mode = origin->mode; } else { @@ -569,7 +566,7 @@ static struct origin *find_origin(struct scoreboard *sb, die("internal error in blame::find_origin (%c)", p->status); case 'M': - porigin = get_origin(sb, parent, origin->path); + porigin = get_origin(parent, origin->path); oidcpy(>blob_oid, >one->oid); porigin->mode = p->one->mode; break; @@ -588,8 +585,7 @@ static struct origin *find_origin(struct scoreboard *sb, * We have an origin -- find the path that corresponds to it in its * parent and return an origin structure to represent it. */ -static struct origin *find_rename(struct scoreboard *sb, - struct commit *parent, +static struct origin *find_rename(struct commit *parent, struct origin *origin) { struct origin *porigin = NULL; @@ -615,7 +611,7 @@ static struct origin *find_rename(struct scoreboard *sb, struct diff_filepair *p = diff_queued_diff.queue[i]; if ((p->status == 'R' || p->status == 'C') && !strcmp(p->two->path, origin->path)) { - porigin = get_origin(sb, parent, p->one->path); + porigin = get_origin(parent, p->one->path); oidcpy(>blob_oid, >one->oid); porigin->mode = p->one->mode; break; @@ -1270,7 +1266,7 @@ static void find_copy_in_parent(struct scoreboard *sb, /* find_move already dealt with this path */ continue; - norigin = get_origin(sb, parent, p->one->path); + norigin = get_origin(parent, p->one->path); oidcpy(>blob_oid, >one->oid); norigin->mode = p->one->mode; fill_origin_blob(>revs->diffopt, norigin, _p); @@ -1404,8 +1400,7 @@ static void pass_blame(struct scoreboard *sb, struct origin *origin, int opt) * common cases, then we look for renames in the second pass. */ for (pass = 0; pass < 2 - no_whole_file_rename; pass++) { - struct origin *(*find)(struct scoreboard *, - struct commit *, struct origin *); + struct origin *(*find)(struct commit *, struct origin *); find = pass ? find_rename : find_origin; for (i = 0, sg = first_scapegoat(revs, commit); @@ -1418,7 +1413,7 @@ static void pass_blame(struct scoreboard *sb, struct origin *origin, int opt) continue; if (parse_commit(p)) continue; - porigin = find(sb, p, origin); + porigin = find(p, origin); if (!porigin) continue; if (!oidcmp(>blob_oid, >blob_oid)) { @@ -2806,7 +2801,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix) sb.final_buf_size = o->file.size; } else { - o = get_origin(, sb.final, path); +
Re: [PATCH] usage.c: drop set_error_handle()
On 05/12, Jeff King wrote: > The set_error_handle() function was introduced by 3b331e926 > (vreportf: report to arbitrary filehandles, 2015-08-11) so > that run-command could send post-fork, pre-exec errors to > the parent's original stderr. > > That use went away in 79319b194 (run-command: eliminate > calls to error handling functions in child, 2017-04-19), > which pushes all of the error reporting to the parent. > This leaves no callers of set_error_handle(). As we're not > likely to add any new ones, let's drop it. > > Signed-off-by: Jeff KingLooks good to me! Thanks -- Brandon Williams
[PATCH v2 7/7] grep: add support for PCRE v2
Add support for v2 of the PCRE API. This is a new major version of PCRE that came out in early 2015[1]. The regular expression syntax is the same, but while the API is similar, pretty much every function is either renamed or takes different arguments. Thus using it via entirely new functions makes sense, as opposed to trying to e.g. have one compile_pcre_pattern() that would call either PCRE v1 or v2 functions. Git can now be compiled with either USE_LIBPCRE1=YesPlease or USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a synonym for the former. Providing both is a compile-time error. With earlier patches to enable JIT for PCRE v1 the performance of the release versions of both libraries is almost exactly the same, with PCRE v2 being around 1% slower. However after I reported this to the pcre-dev mailing list[2] I got a lot of help with the API use from Zoltán Herczeg, he subsequently optimized some of the JIT functionality in v2 of the library. Running the p7820-grep-engines.sh performance test against the latest Subversion trunk of both, with both them and git compiled as -O3, and the test run against linux.git, gives the following results. Just the /perl/ tests shown: $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD p7820-grep-engines.sh [...] Test HEAD~2HEAD~ HEAD 7820.3: perl grep how.to 0.19(0.34+0.62) 0.18(0.39+0.57) -5.3% 0.19(0.32+0.61) +0.0% 7820.7: perl grep ^how to 0.21(0.68+0.51) 0.21(0.64+0.54) +0.0% 0.19(0.32+0.60) -9.5% 7820.11: perl grep [how] to 0.25(0.92+0.51) 0.26(0.93+0.49) +4.0% 0.21(0.39+0.62) -16.0% 7820.15: perl grep (e.t[^ ]*|v.ry) rare 0.26(1.18+0.42) 0.26(1.14+0.45) +0.0% 0.20(0.49+0.57) -23.1% 7820.19: perl grep m(ú|u)lt.b(æ|y)te 0.24(0.85+0.48) 0.23(0.92+0.41) -4.2% 0.19(0.36+0.56) -20.8% See commit ("perf: add a performance comparison test of grep -G, -E and -P", 2017-04-19) for further details on the machine the above test run was executed on. Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine mentioning p7820-grep-engines.sh for more details on the test setup. For ease of readability, a different run just of HEAD~ (PCRE v1 with JIT v.s. PCRE v2), again with just the /perl/ tests shown: Test HEAD~ HEAD --- 7820.3: perl grep how.to 0.19(0.40+0.56) 0.19(0.34+0.59) +0.0% 7820.7: perl grep ^how to 0.21(0.64+0.54) 0.19(0.30+0.63) -9.5% 7820.11: perl grep [how] to 0.25(0.94+0.48) 0.21(0.38+0.62) -16.0% 7820.15: perl grep (e.t[^ ]*|v.ry) rare 0.26(1.13+0.46) 0.20(0.48+0.58) -23.1% 7820.19: perl grep m(ú|u)lt.b(æ|y)te 0.23(0.84+0.50) 0.18(0.29+0.63) -21.7% I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead, when it does it's around 20% faster. A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3) the compiled pattern can be shared between threads, but not some of the JIT context, however the grep threading support does all pattern & JIT compilation in separate threads, so this code doesn't need to concern itself with thread safety. See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the initial addition of PCRE v1. This change follows some of the same patterns it did (and which were discussed on list at the time), e.g. mocking up types with typedef instead of ifdef-ing them out when USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the program, but makes the code look nicer. 1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html 2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html Signed-off-by: Ævar Arnfjörð Bjarmason--- Makefile | 30 +--- configure.ac | 77 ++- grep.c| 143 ++ grep.h| 17 +++ t/test-lib.sh | 2 +- 5 files changed, 250 insertions(+), 19 deletions(-) diff --git a/Makefile b/Makefile index
[PATCH v2 2/7] grep: skip pthreads overhead when using one thread
Skip the administrative overhead of using pthreads when only using one thread. Instead take the non-threaded path which would be taken under NO_PTHREADS. The threading support was initially added in commit 5b594f457a ("Threaded grep", 2010-01-25) with a hardcoded compile-time number of 8 threads. Later the number of threads was made configurable in commit 89f09dd34e ("grep: add --threads= option and grep.threads configuration", 2015-12-15). That change did not add any special handling for --threads=1. Now we take a slightly faster path by skipping thread handling entirely when 1 thread is requested. Signed-off-by: Ævar Arnfjörð Bjarmason--- builtin/grep.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/grep.c b/builtin/grep.c index 12e62fcbf3..bd008cb100 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -1238,6 +1238,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) num_threads = GREP_NUM_THREADS_DEFAULT; else if (num_threads < 0) die(_("invalid number of threads specified (%d)"), num_threads); + if (num_threads == 1) + num_threads = 0; #else if (num_threads) warning(_("no threads support, ignoring --threads")); -- 2.11.0
[PATCH v2 5/7] grep: un-break building with PCRE < 8.32
Amend my change earlier in this series ("grep: add support for the PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1 versions earlier than 8.32. The JIT support was added in version 8.20 released on 2011-10-21, but it wasn't until 8.32 released on 2012-11-30 that the fast code path to use the JIT via pcre_jit_exec() was added[1] (see also [2]). This means that versions 8.20 through 8.31 could still use the JIT, but supporting it on those versions would add to the already verbose macro soup around JIT support it, and I don't expect that the use-case of compiling a brand new git against a 5 year old PCRE is particularly common, and if someone does that they can just get the existing pre-JIT slow codepath. So just take the easy way out and disable the JIT on any version older than 8.32. The reason this change isn't part of the initial change PCRE JIT support is because possibly slightly annoying someone who's bisecting with an ancient PCRE is worth it to have a cleaner history showing which parts of the implementation are only used for ancient PCRE versions. This also makes it easier to revert this change if we ever decide to stop supporting those old versions. 1. http://www.pcre.org/original/changelog.txt ("28. Introducing a native interface for JIT. Through this interface, the compiled[...]") 2. https://bugs.exim.org/show_bug.cgi?id=2121 Signed-off-by: Ævar Arnfjörð Bjarmason--- grep.c | 8 grep.h | 5 + 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/grep.c b/grep.c index accf1c45e6..81337638ca 100644 --- a/grep.c +++ b/grep.c @@ -351,7 +351,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) const char *error; int erroffset; int options = PCRE_MULTILINE; -#ifdef PCRE_CONFIG_JIT +#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT int canjit; #endif @@ -372,7 +372,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) if (!p->pcre1_extra_info && error) die("%s", error); -#ifdef PCRE_CONFIG_JIT +#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT pcre_config(PCRE_CONFIG_JIT, ); if (canjit == 1) { p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024); @@ -392,7 +392,7 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol, if (eflags & REG_NOTBOL) flags |= PCRE_NOTBOL; -#ifdef PCRE_CONFIG_JIT +#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT if (p->pcre1_jit_on) ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line, 0, flags, ovector, @@ -420,7 +420,7 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol, static void free_pcre1_regexp(struct grep_pat *p) { pcre_free(p->pcre1_regexp); -#ifdef PCRE_CONFIG_JIT +#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT if (p->pcre1_jit_on) { pcre_free_study(p->pcre1_extra_info); pcre_jit_stack_free(p->pcre1_jit_stack); diff --git a/grep.h b/grep.h index 14f47189f9..73ef0ef8ec 100644 --- a/grep.h +++ b/grep.h @@ -3,6 +3,11 @@ #include "color.h" #ifdef USE_LIBPCRE1 #include +#ifdef PCRE_CONFIG_JIT +#if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32 +#define GIT_PCRE1_CAN_DO_MODERN_JIT +#endif +#endif #ifndef PCRE_STUDY_JIT_COMPILE #define PCRE_STUDY_JIT_COMPILE 0 #endif -- 2.11.0
[PATCH v2 6/7] grep: un-break building with PCRE < 8.20
Amend my change earlier in this series ("grep: add support for the PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1 versions earlier than 8.20. The 8.20 release was the first release to have JIT & pcre_jit_stack in the headers, so a mock type needs to be provided for it on those releases. Now git should compile with all PCRE versions that it supported before my JIT change. I've tested it as far back as version 7.5 released on 2008-01-10, once I got down to version 7.0 it wouldn't build anymore with GCC 7.1.1, and I couldn't be bothered to anything older than 7.5 as I'm confident that if the build breaks on those older versions it's not because of my JIT change. See the "un-break" change in this series ("grep: un-break building with PCRE < 8.32", 2017-05-10) for why this isn't squashed into the main PCRE JIT commit. Signed-off-by: Ævar Arnfjörð Bjarmason--- grep.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/grep.h b/grep.h index 73ef0ef8ec..b7b9d487b0 100644 --- a/grep.h +++ b/grep.h @@ -11,6 +11,9 @@ #ifndef PCRE_STUDY_JIT_COMPILE #define PCRE_STUDY_JIT_COMPILE 0 #endif +#if PCRE_MAJOR <= 8 && PCRE_MINOR < 20 +typedef int pcre_jit_stack; +#endif #else typedef int pcre; typedef int pcre_extra; -- 2.11.0
[PATCH v2 4/7] grep: add support for the PCRE v1 JIT API
Change the grep PCRE v1 code to use JIT when available. When PCRE support was initially added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into 8.20 released on 2011-10-21. Enabling JIT support usually improves performance by more than 40%. The pattern compilation times are relatively slower, but those relative numbers are tiny, and are easily made back in all but the most trivial cases of grep. Detailed benchmarks & overview of compilation times is at: http://sljit.sourceforge.net/pcre.html With this change the difference in a t/perf/p7820-grep-engines.sh run is, with just the /perl/ tests shown: $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~ HEAD p7820-grep-engines.sh Test HEAD~ HEAD --- 7820.3: perl grep how.to 0.27(1.18+0.42) 0.19(0.30+0.68) -29.6% 7820.7: perl grep ^how to 0.48(2.74+0.41) 0.21(0.64+0.55) -56.2% 7820.11: perl grep [how] to 0.47(2.56+0.47) 0.26(0.86+0.56) -44.7% 7820.15: perl grep (e.t[^ ]*|v.ry) rare 0.88(5.79+0.38) 0.26(1.06+0.54) -70.5% 7820.19: perl grep m(ú|u)lt.b(æ|y)te 0.32(1.56+0.46) 0.23(0.86+0.48) -28.1% The conditional support for JIT is implemented as suggested in the pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's not present. The implementation is relatively verbose because even if PCRE_CONFIG_JIT is defined only a call to pcre_config() can determine if the JIT is available, and if so the faster pcre_jit_exec() function should be called instead of pcre_exec(), and a different (but not complimentary!) function needs to be called to free pcre1_extra_info. There's no graceful fallback if pcre_jit_stack_alloc() fails under PCRE_CONFIG_JIT, instead the program will simply abort. I don't think this is worth handling gracefully, it'll only fail in cases where malloc() doesn't work, in which case we're screwed anyway. Signed-off-by: Ævar Arnfjörð Bjarmason--- grep.c | 37 - grep.h | 6 ++ 2 files changed, 42 insertions(+), 1 deletion(-) diff --git a/grep.c b/grep.c index 1157529115..accf1c45e6 100644 --- a/grep.c +++ b/grep.c @@ -351,6 +351,9 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) const char *error; int erroffset; int options = PCRE_MULTILINE; +#ifdef PCRE_CONFIG_JIT + int canjit; +#endif if (opt->ignore_case) { if (has_non_ascii(p->pattern)) @@ -365,9 +368,20 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) if (!p->pcre1_regexp) compile_regexp_failed(p, error); - p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, ); + p->pcre1_extra_info = pcre_study(p->pcre1_regexp, PCRE_STUDY_JIT_COMPILE, ); if (!p->pcre1_extra_info && error) die("%s", error); + +#ifdef PCRE_CONFIG_JIT + pcre_config(PCRE_CONFIG_JIT, ); + if (canjit == 1) { + p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024); + if (!p->pcre1_jit_stack) + die("BUG: Couldn't allocate PCRE JIT stack"); + pcre_assign_jit_stack(p->pcre1_extra_info, NULL, p->pcre1_jit_stack); + p->pcre1_jit_on = 1; + } +#endif } static int pcre1match(struct grep_pat *p, const char *line, const char *eol, @@ -378,8 +392,20 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol, if (eflags & REG_NOTBOL) flags |= PCRE_NOTBOL; +#ifdef PCRE_CONFIG_JIT + if (p->pcre1_jit_on) + ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line, + eol - line, 0, flags, ovector, + ARRAY_SIZE(ovector), p->pcre1_jit_stack); + else + ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, + eol - line, 0, flags, ovector, + ARRAY_SIZE(ovector)); +#else ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line, 0, flags, ovector, ARRAY_SIZE(ovector)); +#endif + if (ret < 0 && ret != PCRE_ERROR_NOMATCH) die("pcre_exec failed with error code %d", ret); if (ret > 0) { @@ -394,7 +420,16 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol, static void free_pcre1_regexp(struct grep_pat *p) {
[PATCH v2 3/7] log: add -P as a synonym for --perl-regexp
Add a short -P option as a synonym for the longer --perl-regexp, for consistency with the options the corresponding grep invocations accept. This was intentionally omitted in commit 727b6fc3ed ("log --grep: accept --basic-regexp and --perl-regexp", 2012-10-03) for unspecified future use. Make it consistent with "grep" rather than to keep it open for future use, and to avoid the confusion of -P meaning different things for grep & log, as is the case with the -G option. As noted in the aforementioned commit the --basic-regexp option can't have a corresponding -G argument, as the log command already uses that for -G. Signed-off-by: Ævar Arnfjörð Bjarmason--- Documentation/rev-list-options.txt | 1 + revision.c | 2 +- t/t4202-log.sh | 12 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index a46f70c2b1..9c44eae55d 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -91,6 +91,7 @@ endif::git-rev-list[] Consider the limiting patterns to be fixed strings (don't interpret pattern as a regular expression). +-P:: --perl-regexp:: Consider the limiting patterns to be Perl-compatible regular expressions. diff --git a/revision.c b/revision.c index 7ff61ff5f7..03a3a012de 100644 --- a/revision.c +++ b/revision.c @@ -1995,7 +1995,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg DIFF_OPT_SET(>diffopt, PICKAXE_IGNORE_CASE); } else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) { revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED; - } else if (!strcmp(arg, "--perl-regexp")) { + } else if (!strcmp(arg, "--perl-regexp") || !strcmp(arg, "-P")) { revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE; } else if (!strcmp(arg, "--all-match")) { revs->grep_filter.all_match = 1; diff --git a/t/t4202-log.sh b/t/t4202-log.sh index 9680dfe400..5c05c650cd 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -350,8 +350,20 @@ test_expect_success 'log with various grep.patternType configurations & command- --grep="(1|2)" >actual.fixed.short-arg && git log --pretty=tformat:%s -E \ --grep="\|2" >actual.extended.short-arg && + if test_have_prereq PCRE + then + git log --pretty=tformat:%s -P \ + --grep="[\d]\|" >actual.perl.short-arg + else + test_must_fail git log -P \ + --grep="[\d]\|" + fi && test_cmp expect.fixed actual.fixed.short-arg && test_cmp expect.extended actual.extended.short-arg && + if test_have_prereq PCRE + then + test_cmp expect.perl actual.perl.short-arg + fi && git log --pretty=tformat:%s --fixed-strings \ --grep="(1|2)" >actual.fixed.long-arg && -- 2.11.0
[PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading
Change the pattern compilation logic under threading so that grep doesn't compile a pattern it never ends up using on the non-threaded code path, only to compile it again N times for N threads which will each use their own copy, ignoring the initially compiled pattern. This redundant compilation dates back to the initial introduction of the threaded grep in commit 5b594f457a ("Threaded grep", 2010-01-25). There was never any reason for doing this redundant work other than an oversight in the initial commit. Jeff King suggested on-list in <20170414212325.fefrl3qdjigwy...@sigill.intra.peff.net> that this might be needed to check the pattern for sanity before threaded execution commences. That's not the case. The pattern is compiled under threading in start_threads() before any concurrent execution has started by calling pthread_create(), so if the pattern contains an error we still do the right thing. I.e. die with one error before any threaded execution has commenced, instead of e.g. spewing out an error for each N threads, which could be a regression a change like this might inadvertently introduce. This change is not meant as an optimization, any performance gains from this are in the hundreds to thousands of nanoseconds at most. If we wanted more performance here we could just re-use the compiled patterns in multiple threads (regcomp(3) is thread-safe), or partially re-use them and the associated structures in the case of later PCRE JIT changes. Rather, it's just to make the code easier to reason about. It's confusing to debug this under threading & non-threading when the threading codepaths redundantly compile a pattern which is never used. The reason the patterns are recompiled is as a side-effect of duplicating the whole grep_opt structure, which is not thread safe, writable, and munged during execution. The grep_opt structure then points to the grep_pat structure where pattern or patterns are stored. I looked into e.g. splitting the API into some "do & alloc threadsafe stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the execution time of grep_opt_dup() & pattern compilation is trivial compared to actually executing the grep, so there was no point. Even with the more expensive JIT changes to follow the most expensive PCRE patterns take something like 0.0X milliseconds to compile at most[1]. The undocumented --debug mode added in commit 17bf35a3c7 ("grep: teach --debug option to dump the parse tree", 2012-09-13) still works properly with this change. It only emits debugging info during pattern compilation, which is now dumped by the pattern compiled just before the first thread is started. 1. http://sljit.sourceforge.net/pcre.html Signed-off-by: Ævar Arnfjörð Bjarmason--- builtin/grep.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index b1095362fb..12e62fcbf3 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -224,7 +224,8 @@ static void start_threads(struct grep_opt *opt) int err; struct grep_opt *o = grep_opt_dup(opt); o->output = strbuf_out; - o->debug = 0; + if (i) + o->debug = 0; compile_grep_patterns(o); err = pthread_create([i], NULL, run, o); @@ -1167,8 +1168,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (!opt.fixed && opt.ignore_case) opt.regflags |= REG_ICASE; - compile_grep_patterns(); - /* * We have to find "--" in a separate pass, because its presence * influences how we will parse arguments that come before it. @@ -1245,6 +1244,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix) num_threads = 0; #endif + if (!num_threads) + /* +* The compiled patterns on the main path are only +* used when not using threading. Otherwise +* start_threads() below calls compile_grep_patterns() +* for each thread. +*/ + compile_grep_patterns(); + #ifndef NO_PTHREADS if (num_threads) { if (!(opt.name_only || opt.unmatch_name_only || opt.count) -- 2.11.0
[PATCH v2 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes
Trivial changes since v1, but I wanted to send a new one for completeness since I re-sent the "Easy to review grep & pre-PCRE changes" series. For v1 see <20170511170142.15934-1-ava...@gmail.com> (https://public-inbox.org/git/20170511170142.15934-1-ava...@gmail.com/). Changes noted below & reply to who noted the issue: Ævar Arnfjörð Bjarmason (7): grep: don't redundantly compile throwaway patterns under threading Brandon: Added a few paragraphs to the commit message about why this change is being made, i.e. for ease of understanding the code, not optimization. grep: skip pthreads overhead when using one thread log: add -P as a synonym for --perl-regexp grep: add support for the PCRE v1 JIT API grep: un-break building with PCRE < 8.32 grep: un-break building with PCRE < 8.20 grep: add support for PCRE v2 No changes in these, except for re-running the performance tests & changing the commit messages accordingly, due to the change in the t/perf code in the previous series, as noted there. Documentation/rev-list-options.txt | 1 + Makefile | 30 +-- builtin/grep.c | 16 +++- configure.ac | 77 +--- grep.c | 180 - grep.h | 31 +++ revision.c | 2 +- t/t4202-log.sh | 12 +++ t/test-lib.sh | 2 +- 9 files changed, 327 insertions(+), 24 deletions(-) -- 2.11.0
[PATCH v2 27/29] pack-objects: fix buggy warning about threads
Fix a buggy warning about threads under NO_PTHREADS=YesPlease. Due to re-using the delta_search_threads variable for both the state of the "pack.threads" config & the --threads option, setting "pack.threads" but not supplying --threads would trigger the warning for both "pack.threads" & --threads. Solve this bug by resetting the delta_search_threads variable in git_pack_config(), it might then be set by --threads again and be subsequently warned about, as the test I'm changing here asserts. Signed-off-by: Ævar Arnfjörð Bjarmason--- builtin/pack-objects.c | 4 +++- t/t5300-pack-object.sh | 3 +-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 0fe35d1b5a..f1baf05dfe 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2472,8 +2472,10 @@ static int git_pack_config(const char *k, const char *v, void *cb) die("invalid number of threads specified (%d)", delta_search_threads); #ifdef NO_PTHREADS - if (delta_search_threads != 1) + if (delta_search_threads != 1) { warning("no threads support, ignoring %s", k); + delta_search_threads = 0; + } #endif return 0; } diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh index 1629fa80b0..0ec25c4966 100755 --- a/t/t5300-pack-object.sh +++ b/t/t5300-pack-object.sh @@ -445,8 +445,7 @@ test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'pack-objects --threads=N or pack. git -c pack.threads=2 pack-objects --stdout --all /dev/null 2>err && cat err && grep ^warning: err >warnings && - test_line_count = 2 warnings && - grep -F "no threads support, ignoring --threads" err && + test_line_count = 1 warnings && grep -F "no threads support, ignoring pack.threads" err && git -c pack.threads=2 pack-objects --threads=4 --stdout --all /dev/null 2>err && grep ^warning: err >warnings && -- 2.11.0
[PATCH v2 20/29] grep: remove redundant `regflags &= ~REG_EXTENDED` assignments
Remove redundant assignments to the "regflags" variable. There are no code paths that have previously set the regflags to anything, and certainly not to `|= REG_EXTENDED`. This code gave the impression that it had to reset its environment, but it doesn't. This dates back to the initial introduction of git-grep in commit 5010cb5fcc ("built-in "git grep"", 2006-04-30). Signed-off-by: Ævar Arnfjörð Bjarmason--- grep.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/grep.c b/grep.c index 59ae7809f2..bf6c2494fd 100644 --- a/grep.c +++ b/grep.c @@ -179,7 +179,6 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st case GREP_PATTERN_TYPE_BRE: opt->fixed = 0; opt->pcre = 0; - opt->regflags &= ~REG_EXTENDED; break; case GREP_PATTERN_TYPE_ERE: @@ -191,7 +190,6 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st case GREP_PATTERN_TYPE_FIXED: opt->fixed = 1; opt->pcre = 0; - opt->regflags &= ~REG_EXTENDED; break; case GREP_PATTERN_TYPE_PCRE: @@ -414,10 +412,9 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt) { struct strbuf sb = STRBUF_INIT; int err; - int regflags; + int regflags = opt->regflags; basic_regex_quote_buf(, p->pattern); - regflags = opt->regflags & ~REG_EXTENDED; if (opt->ignore_case) regflags |= REG_ICASE; err = regcomp(>regexp, sb.buf, regflags); -- 2.11.0
[PATCH v2 25/29] test-lib: add a PTHREADS prerequisite
Add a PTHREADS prerequisite which is false when git is compiled with NO_PTHREADS=YesPlease. There's lots of custom code that runs when threading isn't available, but before this prerequisite there was no way to test it. Signed-off-by: Ævar Arnfjörð Bjarmason--- Makefile | 1 + t/README | 4 t/test-lib.sh | 1 + 3 files changed, 6 insertions(+) diff --git a/Makefile b/Makefile index 374fbc7e58..a79274e5e6 100644 --- a/Makefile +++ b/Makefile @@ -2242,6 +2242,7 @@ GIT-BUILD-OPTIONS: FORCE @echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+ @echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+ @echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+ + @echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+ @echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+ @echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+ @echo PAGER_ENV=\''$(subst ','\'',$(subst ','\'',$(PAGER_ENV)))'\' >>$@+ diff --git a/t/README b/t/README index a90cb62583..2f95860369 100644 --- a/t/README +++ b/t/README @@ -817,6 +817,10 @@ use these, and "test_set_prereq" for how to define your own. Test is run on a filesystem which converts decomposed utf-8 (nfd) to precomposed utf-8 (nfc). + - PTHREADS + + Git wasn't compiled with NO_PTHREADS=YesPlease. + Tips for Writing Tests -- diff --git a/t/test-lib.sh b/t/test-lib.sh index e5cfbcc36b..ab92c0ebaa 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1009,6 +1009,7 @@ esac ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1 test -z "$NO_PERL" && test_set_prereq PERL +test -z "$NO_PTHREADS" && test_set_prereq PTHREADS test -z "$NO_PYTHON" && test_set_prereq PYTHON test -n "$USE_LIBPCRE1" && test_set_prereq PCRE test -z "$NO_GETTEXT" && test_set_prereq GETTEXT -- 2.11.0
[PATCH v2 23/29] grep: change internal *pcre* variable & function names to be *pcre1*
Change the internal PCRE variable & function names to have a "1" suffix. This is for preparation for libpcre2 support, where having non-versioned names would be confusing. An earlier change in this series ("grep: change the internal PCRE macro names to be PCRE1", 2017-04-07) elaborates on the motivations behind this change. Signed-off-by: Ævar Arnfjörð Bjarmason--- grep.c | 52 ++-- grep.h | 8 2 files changed, 30 insertions(+), 30 deletions(-) diff --git a/grep.c b/grep.c index 854f2404be..07512346b1 100644 --- a/grep.c +++ b/grep.c @@ -178,23 +178,23 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st case GREP_PATTERN_TYPE_BRE: opt->fixed = 0; - opt->pcre = 0; + opt->pcre1 = 0; break; case GREP_PATTERN_TYPE_ERE: opt->fixed = 0; - opt->pcre = 0; + opt->pcre1 = 0; opt->regflags |= REG_EXTENDED; break; case GREP_PATTERN_TYPE_FIXED: opt->fixed = 1; - opt->pcre = 0; + opt->pcre1 = 0; break; case GREP_PATTERN_TYPE_PCRE: opt->fixed = 0; - opt->pcre = 1; + opt->pcre1 = 1; break; } } @@ -334,7 +334,7 @@ static int has_null(const char *s, size_t len) } #ifdef USE_LIBPCRE1 -static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) +static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) { const char *error; int erroffset; @@ -342,23 +342,23 @@ static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) if (opt->ignore_case) { if (has_non_ascii(p->pattern)) - p->pcre_tables = pcre_maketables(); + p->pcre1_tables = pcre_maketables(); options |= PCRE_CASELESS; } if (is_utf8_locale() && has_non_ascii(p->pattern)) options |= PCRE_UTF8; - p->pcre_regexp = pcre_compile(p->pattern, options, , , - p->pcre_tables); - if (!p->pcre_regexp) + p->pcre1_regexp = pcre_compile(p->pattern, options, , , + p->pcre1_tables); + if (!p->pcre1_regexp) compile_regexp_failed(p, error); - p->pcre_extra_info = pcre_study(p->pcre_regexp, 0, ); - if (!p->pcre_extra_info && error) + p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, ); + if (!p->pcre1_extra_info && error) die("%s", error); } -static int pcrematch(struct grep_pat *p, const char *line, const char *eol, +static int pcre1match(struct grep_pat *p, const char *line, const char *eol, regmatch_t *match, int eflags) { int ovector[30], ret, flags = 0; @@ -366,7 +366,7 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol, if (eflags & REG_NOTBOL) flags |= PCRE_NOTBOL; - ret = pcre_exec(p->pcre_regexp, p->pcre_extra_info, line, eol - line, + ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line, 0, flags, ovector, ARRAY_SIZE(ovector)); if (ret < 0 && ret != PCRE_ERROR_NOMATCH) die("pcre_exec failed with error code %d", ret); @@ -379,25 +379,25 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol, return ret; } -static void free_pcre_regexp(struct grep_pat *p) +static void free_pcre1_regexp(struct grep_pat *p) { - pcre_free(p->pcre_regexp); - pcre_free(p->pcre_extra_info); - pcre_free((void *)p->pcre_tables); + pcre_free(p->pcre1_regexp); + pcre_free(p->pcre1_extra_info); + pcre_free((void *)p->pcre1_tables); } #else /* !USE_LIBPCRE1 */ -static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) +static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) { die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE"); } -static int pcrematch(struct grep_pat *p, const char *line, const char *eol, +static int pcre1match(struct grep_pat *p, const char *line, const char *eol, regmatch_t *match, int eflags) { return 1; } -static void free_pcre_regexp(struct grep_pat *p) +static void free_pcre1_regexp(struct grep_pat *p) { } #endif /* !USE_LIBPCRE1 */ @@ -477,8 +477,8 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt) return; } - if (opt->pcre) { - compile_pcre_regexp(p, opt); + if (opt->pcre1) { + compile_pcre1_regexp(p, opt); return; } @@ -834,8 +834,8 @@ void
[PATCH v2 26/29] pack-objects & index-pack: add test for --threads warning
Add a test for the warning that's emitted when --threads or pack.threads is provided under NO_PTHREADS=YesPlease. This uses the new PTHREADS prerequisite. The assertion for C_LOCALE_OUTPUT in the latter test is currently redundant, since unlike index-pack the pack-objects warnings aren't i18n'd. However they might be changed to be i18n'd in the future, and there's no harm in future-proofing the test. There's an existing bug in the implementation of pack-objects which this test currently tests for as-is. Details about the bug & the fix are included in a follow-up change. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t5300-pack-object.sh | 34 ++ 1 file changed, 34 insertions(+) diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh index 43a672c345..1629fa80b0 100755 --- a/t/t5300-pack-object.sh +++ b/t/t5300-pack-object.sh @@ -421,6 +421,40 @@ test_expect_success 'index-pack works in non-repo' ' test_path_is_file foo.idx ' +test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'index-pack --threads=N or pack.threads=N warns when no pthreads' ' + test_must_fail git index-pack --threads=2 2>err && + grep ^warning: err >warnings && + test_line_count = 1 warnings && + grep -F "no threads support, ignoring --threads=2" err && + test_must_fail git -c pack.threads=2 index-pack 2>err && + grep ^warning: err >warnings && + test_line_count = 1 warnings && + grep -F "no threads support, ignoring pack.threads" err && + test_must_fail git -c pack.threads=2 index-pack --threads=4 2>err && + grep ^warning: err >warnings && + test_line_count = 2 warnings && + grep -F "no threads support, ignoring --threads=4" err && + grep -F "no threads support, ignoring pack.threads" err +' + +test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'pack-objects --threads=N or pack.threads=N warns when no pthreads' ' + git pack-objects --threads=2 --stdout --all /dev/null 2>err && + grep ^warning: err >warnings && + test_line_count = 1 warnings && + grep -F "no threads support, ignoring --threads" err && + git -c pack.threads=2 pack-objects --stdout --all /dev/null 2>err && + cat err && + grep ^warning: err >warnings && + test_line_count = 2 warnings && + grep -F "no threads support, ignoring --threads" err && + grep -F "no threads support, ignoring pack.threads" err && + git -c pack.threads=2 pack-objects --threads=4 --stdout --all /dev/null 2>err && + grep ^warning: err >warnings && + test_line_count = 2 warnings && + grep -F "no threads support, ignoring --threads" err && + grep -F "no threads support, ignoring pack.threads" err +' + # # WARNING! # -- 2.11.0
[PATCH v2 24/29] grep: move is_fixed() earlier to avoid forward declaration
Move the is_fixed() function which are currently only used in compile_regexp() earlier so it can be used in the PCRE family of functions in a later change. Signed-off-by: Ævar Arnfjörð Bjarmason--- grep.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/grep.c b/grep.c index 07512346b1..1157529115 100644 --- a/grep.c +++ b/grep.c @@ -321,6 +321,18 @@ static NORETURN void compile_regexp_failed(const struct grep_pat *p, die("%s'%s': %s", where, p->pattern, error); } +static int is_fixed(const char *s, size_t len) +{ + size_t i; + + for (i = 0; i < len; i++) { + if (is_regex_special(s[i])) + return 0; + } + + return 1; +} + static int has_null(const char *s, size_t len) { /* @@ -402,18 +414,6 @@ static void free_pcre1_regexp(struct grep_pat *p) } #endif /* !USE_LIBPCRE1 */ -static int is_fixed(const char *s, size_t len) -{ - size_t i; - - for (i = 0; i < len; i++) { - if (is_regex_special(s[i])) - return 0; - } - - return 1; -} - static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt) { struct strbuf sb = STRBUF_INIT; -- 2.11.0
[PATCH v2 29/29] grep: assert that threading is enabled when calling grep_{lock,unlock}
Change the grep_{lock,unlock} functions to assert that num_threads is true, instead of only locking & unlocking the pthread mutex lock when it is. These functions are never called when num_threads isn't true, this logic has gone through multiple iterations since the initial introduction of grep threading in commit 5b594f457a ("Threaded grep", 2010-01-25), but ever since then they'd only be called if num_threads was true, so this check made the code confusing to read. Replace the check with an assertion, so that it's clear to the reader that this code path is never taken unless we're spawning threads. Signed-off-by: Ævar Arnfjörð Bjarmason--- builtin/grep.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 3c721b75a5..b1095362fb 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -73,14 +73,14 @@ static pthread_mutex_t grep_mutex; static inline void grep_lock(void) { - if (num_threads) - pthread_mutex_lock(_mutex); + assert(num_threads); + pthread_mutex_lock(_mutex); } static inline void grep_unlock(void) { - if (num_threads) - pthread_mutex_unlock(_mutex); + assert(num_threads); + pthread_mutex_unlock(_mutex); } /* Signalled when a new work_item is added to todo. */ -- 2.11.0
[PATCH v2 28/29] grep: given --threads with NO_PTHREADS=YesPlease, warn
Add a warning about missing thread support when grep.threads or --threads is set to a non 0 (default) or 1 (no parallelism) value under NO_PTHREADS=YesPlease. This is for consistency with the index-pack & pack-objects commands, which also take a --threads option & are configurable via pack.threads, and have long warned about the same under NO_PTHREADS=YesPlease. Signed-off-by: Ævar Arnfjörð Bjarmason--- builtin/grep.c | 13 + t/t7810-grep.sh | 18 ++ 2 files changed, 31 insertions(+) diff --git a/builtin/grep.c b/builtin/grep.c index a191e2976b..3c721b75a5 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -289,6 +289,17 @@ static int grep_cmd_config(const char *var, const char *value, void *cb) if (num_threads < 0) die(_("invalid number of threads specified (%d) for %s"), num_threads, var); +#ifdef NO_PTHREADS + else if (num_threads && num_threads != 1) { + /* +* TRANSLATORS: %s is the configuration +* variable for tweaking threads, currently +* grep.threads +*/ + warning(_("no threads support, ignoring %s"), var); + num_threads = 0; + } +#endif } return st; @@ -1229,6 +1240,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) else if (num_threads < 0) die(_("invalid number of threads specified (%d)"), num_threads); #else + if (num_threads) + warning(_("no threads support, ignoring --threads")); num_threads = 0; #endif diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 561709ef6a..f106387820 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -791,6 +791,24 @@ do " done +test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'grep --threads=N or pack.threads=N warns when no pthreads' ' + git grep --threads=2 Hello hello_world 2>err && + grep ^warning: err >warnings && + test_line_count = 1 warnings && + grep -F "no threads support, ignoring --threads" err && + git -c grep.threads=2 grep Hello hello_world 2>err && + grep ^warning: err >warnings && + test_line_count = 1 warnings && + grep -F "no threads support, ignoring grep.threads" err && + git -c grep.threads=2 grep --threads=4 Hello hello_world 2>err && + grep ^warning: err >warnings && + test_line_count = 2 warnings && + grep -F "no threads support, ignoring --threads" err && + grep -F "no threads support, ignoring grep.threads" err && + git -c grep.threads=0 grep --threads=0 Hello hello_world 2>err && + test_line_count = 0 err +' + test_expect_success 'grep from a subdirectory to search wider area (1)' ' mkdir -p s && ( -- 2.11.0
[PATCH v2 22/29] grep: change the internal PCRE macro names to be PCRE1
Change the internal USE_LIBPCRE define, & build options flag to use a naming convention ending in PCRE1, without changing the long-standing USE_LIBPCRE Makefile flag which enables this code. This is for preparation for libpcre2 support where having things like USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely need to for backwards compatibility with old Makefile arguments would be confusing. In some ways it would be better to change everything that now uses USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their build scripts. However I'd like to leave the door open to making USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease, i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it the default. This code and the USE_LIBPCRE Makefile argument was added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was no indication that the PCRE project would release an entirely new & incompatible API around 3 years later. Signed-off-by: Ævar Arnfjörð Bjarmason--- Makefile | 4 ++-- grep.c| 6 +++--- grep.h| 2 +- t/test-lib.sh | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/Makefile b/Makefile index d1587452f1..374fbc7e58 100644 --- a/Makefile +++ b/Makefile @@ -1088,7 +1088,7 @@ ifdef NO_LIBGEN_H endif ifdef USE_LIBPCRE - BASIC_CFLAGS += -DUSE_LIBPCRE + BASIC_CFLAGS += -DUSE_LIBPCRE1 ifdef LIBPCREDIR BASIC_CFLAGS += -I$(LIBPCREDIR)/include EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib) @@ -2240,7 +2240,7 @@ GIT-BUILD-OPTIONS: FORCE @echo TAR=\''$(subst ','\'',$(subst ','\'',$(TAR)))'\' >>$@+ @echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+ @echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+ - @echo USE_LIBPCRE=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+ + @echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+ @echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+ @echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+ @echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+ diff --git a/grep.c b/grep.c index 79eb681c6e..854f2404be 100644 --- a/grep.c +++ b/grep.c @@ -333,7 +333,7 @@ static int has_null(const char *s, size_t len) return 0; } -#ifdef USE_LIBPCRE +#ifdef USE_LIBPCRE1 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) { const char *error; @@ -385,7 +385,7 @@ static void free_pcre_regexp(struct grep_pat *p) pcre_free(p->pcre_extra_info); pcre_free((void *)p->pcre_tables); } -#else /* !USE_LIBPCRE */ +#else /* !USE_LIBPCRE1 */ static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) { die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE"); @@ -400,7 +400,7 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol, static void free_pcre_regexp(struct grep_pat *p) { } -#endif /* !USE_LIBPCRE */ +#endif /* !USE_LIBPCRE1 */ static int is_fixed(const char *s, size_t len) { diff --git a/grep.h b/grep.h index 267534ca24..073b0e4c92 100644 --- a/grep.h +++ b/grep.h @@ -1,7 +1,7 @@ #ifndef GREP_H #define GREP_H #include "color.h" -#ifdef USE_LIBPCRE +#ifdef USE_LIBPCRE1 #include #else typedef int pcre; diff --git a/t/test-lib.sh b/t/test-lib.sh index 6abf1d1918..e5cfbcc36b 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1010,7 +1010,7 @@ esac ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1 test -z "$NO_PERL" && test_set_prereq PERL test -z "$NO_PYTHON" && test_set_prereq PYTHON -test -n "$USE_LIBPCRE" && test_set_prereq PCRE +test -n "$USE_LIBPCRE1" && test_set_prereq PCRE test -z "$NO_GETTEXT" && test_set_prereq GETTEXT # Can we rely on git's output in the C locale? -- 2.11.0
[PATCH v2 21/29] grep: factor test for \0 in grep patterns into a function
Factor the test for \0 in grep patterns into a function. Since commit 9eceddeec6 ("Use kwset in grep", 2011-08-21) any pattern containing a \0 is considered fixed as regcomp() can't handle it. This limitation was never documented, and other some regular expression engines are capable of compiling a pattern containing a \0. Factoring this out makes a subsequent change which does that smaller. See a previous commit in this series ("grep: add tests to fix blind spots with \0 patterns", 2017-04-21) for further details & rationale. While I'm at it make the comment conform to the style guide, i.e. add an opening "/*\n". Signed-off-by: Ævar Arnfjörð Bjarmason--- grep.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/grep.c b/grep.c index bf6c2494fd..79eb681c6e 100644 --- a/grep.c +++ b/grep.c @@ -321,6 +321,18 @@ static NORETURN void compile_regexp_failed(const struct grep_pat *p, die("%s'%s': %s", where, p->pattern, error); } +static int has_null(const char *s, size_t len) +{ + /* +* regcomp cannot accept patterns with NULs so when using it +* we consider any pattern containing a NUL fixed. +*/ + if (memchr(s, 0, len)) + return 1; + + return 0; +} + #ifdef USE_LIBPCRE static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) { @@ -394,12 +406,6 @@ static int is_fixed(const char *s, size_t len) { size_t i; - /* regcomp cannot accept patterns with NULs so we -* consider any pattern containing a NUL fixed. -*/ - if (memchr(s, 0, len)) - return 1; - for (i = 0; i < len; i++) { if (is_regex_special(s[i])) return 0; @@ -451,7 +457,7 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt) * simple string match using kws. p->fixed tells us if we * want to use kws. */ - if (opt->fixed || is_fixed(p->pattern, p->patternlen)) + if (opt->fixed || has_null(p->pattern, p->patternlen) || is_fixed(p->pattern, p->patternlen)) p->fixed = !icase || ascii_only; else p->fixed = 0; -- 2.11.0
[PATCH v2 17/29] perf: add a performance comparison of fixed-string grep
Add a performance comparison test which compares both case-sensitive & case-insensitive fixed-string grep, as well as non-ASCII case-sensitive & case-insensitive grep. Currently only the "-i æ" performance test doesn't go through the same kwset.[ch] codepath, see the "Even when -F..." comment in grep.c. $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run p7821-grep-engines-fixed.sh -- 7821.1: fixed grep int 1.75(1.39+0.34) 7821.2: basic grep int 1.68(1.38+0.28) 7821.3: extended grep int 1.75(1.41+0.29) 7821.4: perl grep int 1.73(1.40+0.30) 7821.6: fixed grep -i int 1.94(1.54+0.35) 7821.7: basic grep -i int 1.92(1.57+0.32) 7821.8: extended grep -i int 1.90(1.54+0.30) 7821.9: perl grep -i int 1.91(1.53+0.36) 7821.11: fixed grep æ 1.35(1.14+0.18) 7821.12: basic grep æ 1.34(1.16+0.16) 7821.13: extended grep æ 1.33(1.15+0.17) 7821.14: perl grep æ 1.35(1.12+0.20) 7821.16: fixed grep -i æ 0.72(0.49+0.22) 7821.17: basic grep -i æ 0.74(0.49+0.21) 7821.18: extended grep -i æ0.72(0.48+0.22) 7821.19: perl grep -i æ0.71(0.44+0.23) I'm planning to make that not be the case, this performance test gives a baseline for comparing performance before & after any such change. See commit ("perf: add a performance comparison test of grep -G, -E and -P", 2017-04-19) for details on the machine the above test run was executed on. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/perf/p7821-grep-engines-fixed.sh | 26 ++ 1 file changed, 26 insertions(+) create mode 100755 t/perf/p7821-grep-engines-fixed.sh diff --git a/t/perf/p7821-grep-engines-fixed.sh b/t/perf/p7821-grep-engines-fixed.sh new file mode 100755 index 00..d771cccfdf --- /dev/null +++ b/t/perf/p7821-grep-engines-fixed.sh @@ -0,0 +1,26 @@ +#!/bin/sh + +test_description="Comparison of fixed string grep under git-grep's regex engines" + +. ./perf-lib.sh + +test_perf_large_repo +test_checkout_worktree + +for args in 'int' '-i int' 'æ' '-i æ' +do + for engine in fixed basic extended perl + do + test_perf "$engine grep $args" " + git -c grep.patternType=$engine grep $args >'out.$engine.$args' || : + " + done + + test_expect_success "assert that all engines found the same for $args" " + test_cmp 'out.fixed.$args' 'out.basic.$args' && + test_cmp 'out.fixed.$args' 'out.extended.$args' && + test_cmp 'out.fixed.$args' 'out.perl.$args' + " +done + +test_done -- 2.11.0
[PATCH v2 19/29] grep: remove redundant regflags assignment under PCRE
Remove a redundant assignment to the "regflags" variable. This variable is only used for POSIX regular expression matching, not when the PCRE library is used. This redundant assignment was added as a result of copy/paste programming in commit 84befcd0a4 ("grep: add a grep.patternType configuration setting", 2012-08-03). That commit modified already working code in commit cca2c172e0 ("git-grep: do not die upon -F/-P when grep.extendedRegexp is set.", 2011-05-09) which didn't assign to regflags when under PCRE. Revert back to that behavior, more to reduce "wait this is used under PCRE how?" confusion when reading the code, than to to save ourselves trivial CPU cycles by removing one assignment. Signed-off-by: Ævar Arnfjörð Bjarmason--- grep.c | 1 - 1 file changed, 1 deletion(-) diff --git a/grep.c b/grep.c index 47cee45067..59ae7809f2 100644 --- a/grep.c +++ b/grep.c @@ -197,7 +197,6 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st case GREP_PATTERN_TYPE_PCRE: opt->fixed = 0; opt->pcre = 1; - opt->regflags &= ~REG_EXTENDED; break; } } -- 2.11.0
[PATCH v2 13/29] grep: add tests to fix blind spots with \0 patterns
Address a big blind spot in the tests for patterns containing \0. The is_fixed() function considers any string that contains \0 fixed, even if it contains regular expression metacharacters, those patterns are currently matched with kwset. Before this change removing that memchr(s, 0, len) check from is_fixed() wouldn't change the result of any of the tests, since regcomp() will happily match the part before the \0. Furthermore, the kwset path is dependent on whether the the -i flag is on, and whether the pattern has any non-ASCII characters, but none of this was tested for. See the a previous commit in this series ("grep: add tests to fix blind spots with \0 patterns", 2017-04-21) for further details & rationale. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t7008-grep-binary.sh | 71 ++ 1 file changed, 71 insertions(+) diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh index e7754c3946..079395ba54 100755 --- a/t/t7008-grep-binary.sh +++ b/t/t7008-grep-binary.sh @@ -22,6 +22,18 @@ nul_match () { printf '$pattern' | q_to_nul >f && test_must_fail git grep -f f $flags a " + elif test "$status" = T1 + then + test_expect_failure "git grep -f f $flags '$pattern_human' a" " + printf '$pattern' | q_to_nul >f && + git grep -f f $flags a + " + elif test "$status" = T0 + then + test_expect_failure "git grep -f f $flags '$pattern_human' a" " + printf '$pattern' | q_to_nul >f && + test_must_fail git grep -f f $flags a + " else test_expect_success "PANIC: Test framework error. Unknown status $status" 'false' fi @@ -98,6 +110,65 @@ nul_match 1 '-Fi' 'YQf' nul_match 0 '-Fi' 'YQx' nul_match 1 '' 'yQf' nul_match 0 '' 'yQx' +nul_match 1 '' 'æQð' +nul_match 1 '-F' 'eQm[*]c' +nul_match 1 '-Fi' 'EQM[*]C' + +# Regex patterns that would match but shouldn't with -F +nul_match 0 '-F' 'yQ[f]' +nul_match 0 '-F' '[y]Qf' +nul_match 0 '-Fi' 'YQ[F]' +nul_match 0 '-Fi' '[Y]QF' +nul_match 0 '-F' 'æQ[ð]' +nul_match 0 '-F' '[æ]Qð' +nul_match 0 '-Fi' 'ÆQ[Ð]' +nul_match 0 '-Fi' '[Æ]QÐ' + +# kwset is disabled on -i & non-ASCII. No way to match non-ASCII \0 +# patterns case-insensitively. +nul_match T1 '-i' 'ÆQÐ' + +# \0 implicitly disables regexes. This is an undocumented internal +# limitation. +nul_match T1 '' 'yQ[f]' +nul_match T1 '' '[y]Qf' +nul_match T1 '-i' 'YQ[F]' +nul_match T1 '-i' '[Y]Qf' +nul_match T1 '' 'æQ[ð]' +nul_match T1 '' '[æ]Qð' +nul_match T1 '-i' 'ÆQ[Ð]' + +# ... because of \0 implicitly disabling regexes regexes that +# should/shouldn't match don't do the right thing. +nul_match T1 '' 'eQm.*cQ' +nul_match T1 '-i' 'EQM.*cQ' +nul_match T0 '' 'eQm[*]c' +nul_match T0 '-i' 'EQM[*]C' + +# Due to the REG_STARTEND extension when kwset() is disabled on -i & +# non-ASCII the string will be matched in its entirety, but the +# pattern will be cut off at the first \0. +nul_match 0 '-i' 'NOMATCHQð' +nul_match T0 '-i' '[Æ]QNOMATCH' +nul_match T0 '-i' '[æ]QNOMATCH' +# Matches, but for the wrong reasons, just stops at [æ] +nul_match 1 '-i' '[Æ]Qð' +nul_match 1 '-i' '[æ]Qð' + +# Ensure that the matcher doesn't regress to something that stops at +# \0 +nul_match 0 '-F' 'yQ[f]' +nul_match 0 '-Fi' 'YQ[F]' +nul_match 0 '' 'yQNOMATCH' +nul_match 0 '' 'QNOMATCH' +nul_match 0 '-i' 'YQNOMATCH' +nul_match 0 '-i' 'QNOMATCH' +nul_match 0 '-F' 'æQ[ð]' +nul_match 0 '-Fi' 'ÆQ[Ð]' +nul_match 0 '' 'yQNÓMATCH' +nul_match 0 '' 'QNÓMATCH' +nul_match 0 '-i' 'YQNÓMATCH' +nul_match 0 '-i' 'QNÓMATCH' test_expect_success 'grep respects binary diff attribute' ' echo text >t && -- 2.11.0
[PATCH v2 14/29] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do
Add a git GIT_PERF_MAKE_COMMAND variable to compliment the existing GIT_PERF_MAKE_OPTS facility. This allows specifying an arbitrary shell command to execute instead of 'make'. This is useful e.g. in cases where the name, semantics or defaults of a Makefile flag have changed over time. It can even be used to change the contents of the tree, useful for monkeypatching ancient versions of git to get them to build. This opens Pandora's box in some ways, it's now possible to "jailbreak" the perf environment and e.g. modify the source tree via this arbitrary instead of just issuing a custom "make" command, such a command has to be re-entrant in the sense that subsequent perf runs will re-use the possibly modified tree. It would be pointless to try to mitigate or work around that caveat in a tool purely aimed at Git developers, so this change makes no attempt to do so. Signed-off-by: Ævar Arnfjörð Bjarmason--- Makefile | 3 +++ t/perf/README | 19 +-- t/perf/run| 11 +-- 3 files changed, 29 insertions(+), 4 deletions(-) diff --git a/Makefile b/Makefile index eedadb8056..d1587452f1 100644 --- a/Makefile +++ b/Makefile @@ -2272,6 +2272,9 @@ endif ifdef GIT_PERF_MAKE_OPTS @echo GIT_PERF_MAKE_OPTS=\''$(subst ','\'',$(subst ','\'',$(GIT_PERF_MAKE_OPTS)))'\' >>$@+ endif +ifdef GIT_PERF_MAKE_COMMAND + @echo GIT_PERF_MAKE_COMMAND=\''$(subst ','\'',$(subst ','\'',$(GIT_PERF_MAKE_COMMAND)))'\' >>$@+ +endif ifdef GIT_INTEROP_MAKE_OPTS @echo GIT_INTEROP_MAKE_OPTS=\''$(subst ','\'',$(subst ','\'',$(GIT_INTEROP_MAKE_OPTS)))'\' >>$@+ endif diff --git a/t/perf/README b/t/perf/README index 49ea4349be..b3d95042a8 100644 --- a/t/perf/README +++ b/t/perf/README @@ -60,8 +60,23 @@ You can set the following variables (also in your config.mak): GIT_PERF_MAKE_OPTS Options to use when automatically building a git tree for - performance testing. E.g., -j6 would be useful. - + performance testing. E.g., -j6 would be useful. Passed + directly to make as "make $GIT_PERF_MAKE_OPTS". + +GIT_PERF_MAKE_COMMAND + An arbitrary command that'll be run in place of the make + command, if set the GIT_PERF_MAKE_OPTS variable is + ignored. Useful in cases where source tree changes might + require issuing a different make command to different + revisions. + + This can be (ab)used to monkeypatch or otherwise change the + tree about to be built. Note that the build directory can be + re-used for subsequent runs so the make command might get + executed multiple times on the same tree, but don't count on + any of that, that's an implementation detail that might change + in the future. + GIT_PERF_REPO GIT_PERF_LARGE_REPO Repositories to copy for the performance tests. The normal diff --git a/t/perf/run b/t/perf/run index c788d713ae..b61024a830 100755 --- a/t/perf/run +++ b/t/perf/run @@ -37,8 +37,15 @@ build_git_rev () { cp "../../$config" "build/$rev/" fi done - (cd build/$rev && make $GIT_PERF_MAKE_OPTS) || - die "failed to build revision '$mydir'" + ( + cd build/$rev && + if test -n "$GIT_PERF_MAKE_COMMAND" + then + sh -c "$GIT_PERF_MAKE_COMMAND" + else + make $GIT_PERF_MAKE_OPTS + fi + ) || die "failed to build revision '$mydir'" } run_dirs_helper () { -- 2.11.0
[PATCH v2 16/29] perf: add a performance comparison test of grep -G, -E and -P
Add a very basic performance comparison test comparing the POSIX basic, extended and perl engines. In theory the "basic" and "extended" engines should be implemented using the same underlying code with a slightly different pattern parser, but some implementations may not do this. Jump through some slight hoops to test both, which is worthwhile since "basic" is the default. Running this on an i7 3.4GHz Linux 4.9.0-2 Debian testing against a checkout of linux.git & latest upstream PCRE, both PCRE and git compiled with -O3 using gcc 7.1.1: $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run p7820-grep-engines.sh [...] - 7820.1: basic grep how.to 0.28(1.27+0.41) 7820.2: extended grep how.to 0.29(1.23+0.44) 7820.3: perl grep how.to 0.27(1.14+0.47) 7820.5: basic grep ^how to0.28(1.18+0.46) 7820.6: extended grep ^how to 0.28(1.25+0.40) 7820.7: perl grep ^how to 0.48(2.76+0.40) 7820.9: basic grep [how] to 0.42(2.20+0.42) 7820.10: extended grep [how] to 0.41(2.12+0.50) 7820.11: perl grep [how] to 0.48(2.59+0.42) 7820.13: basic grep \(e.t[^ ]*\|v.ry\) rare 0.55(3.26+0.44) 7820.14: extended grep (e.t[^ ]*|v.ry) rare 0.54(3.34+0.38) 7820.15: perl grep (e.t[^ ]*|v.ry) rare 0.88(5.82+0.35) 7820.17: basic grep m\(ú\|u\)lt.b\(æ\|y\)te 0.29(1.26+0.48) 7820.18: extended grep m(ú|u)lt.b(æ|y)te 0.29(1.36+0.39) 7820.19: perl grep m(ú|u)lt.b(æ|y)te 0.32(1.61+0.43) Signed-off-by: Ævar Arnfjörð Bjarmason--- t/perf/p7820-grep-engines.sh | 35 +++ 1 file changed, 35 insertions(+) create mode 100755 t/perf/p7820-grep-engines.sh diff --git a/t/perf/p7820-grep-engines.sh b/t/perf/p7820-grep-engines.sh new file mode 100755 index 00..eb439900f3 --- /dev/null +++ b/t/perf/p7820-grep-engines.sh @@ -0,0 +1,35 @@ +#!/bin/sh + +test_description="Comparison of git-grep's regex engines" + +. ./perf-lib.sh + +test_perf_large_repo +test_checkout_worktree + +for pattern in \ + 'how.to' \ + '^how to' \ + '[how] to' \ + '\(e.t[^ ]*\|v.ry\) rare' \ + 'm\(ú\|u\)lt.b\(æ\|y\)te' +do + for engine in basic extended perl + do + if test $engine != "basic" + then + # Poor man's basic -> extended converter. + pattern=$(echo $pattern | sed 's/\\//g') + fi + test_perf "$engine grep $pattern" " + git -c grep.patternType=$engine grep -- '$pattern' >'out.$engine' || : + " + done + + test_expect_success "assert that all engines found the same for $pattern" " + test_cmp 'out.basic' 'out.extended' && + test_cmp 'out.basic' 'out.perl' + " +done + +test_done -- 2.11.0
[PATCH v2 15/29] perf: emit progress output when unpacking & building
Amend the t/perf/run output so that in addition to the "Running N tests" heading currently being emitted, it also emits "Unpacking $rev" and "Building $rev" when setting up the build/$rev directory & when building it, respectively. This makes it easier to see what's going on and what revision is being tested as the output scrolls by. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/perf/run | 2 ++ 1 file changed, 2 insertions(+) diff --git a/t/perf/run b/t/perf/run index b61024a830..beb4acc0e4 100755 --- a/t/perf/run +++ b/t/perf/run @@ -24,6 +24,7 @@ run_one_dir () { unpack_git_rev () { rev=$1 + echo "=== Unpacking $rev in build/$rev ===" mkdir -p build/$rev (cd "$(git rev-parse --show-cdup)" && git archive --format=tar $rev) | (cd build/$rev && tar x) @@ -37,6 +38,7 @@ build_git_rev () { cp "../../$config" "build/$rev/" fi done + echo "=== Building $rev ===" ( cd build/$rev && if test -n "$GIT_PERF_MAKE_COMMAND" -- 2.11.0
[PATCH v2 18/29] grep: catch a missing enum in switch statement
Add a die(...) to a default case for the switch statement selecting between grep pattern types under --recurse-submodules. Normally this would be caught by -Wswitch, but the grep_pattern_type type is converted to int by going through parse_options(). Changing the argument type passed to compile_submodule_options() won't work, the value will just get coerced. The -Wswitch-default warning will warn about it, but that produces a lot of noise across the codebase, this potential issue would be drowned in that noise. Thus catching this at runtime is the least worst option. This won't ever trigger in practice, but if a new pattern type were to be added this catches an otherwise silent bug during development. See commit 0281e487fd ("grep: optionally recurse into submodules", 2016-12-16) for the initial addition of this code. Signed-off-by: Ævar Arnfjörð Bjarmason--- builtin/grep.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/grep.c b/builtin/grep.c index 3ffb5b4e81..a191e2976b 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -495,6 +495,8 @@ static void compile_submodule_options(const struct grep_opt *opt, break; case GREP_PATTERN_TYPE_UNSPECIFIED: break; + default: + die("BUG: Added a new grep pattern type without updating switch statement"); } for (pattern = opt->pattern_list; pattern != NULL; -- 2.11.0
[PATCH v2 10/29] grep: add tests for grep pattern types being passed to submodules
Add testing for grep pattern types being correctly passed to submodules. The pattern "(.|.)[\d]" matches differently under fixed (not at all), and then matches different lines under basic/extended & perl regular expressions, so this change asserts that the pattern type is passed along correctly. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t7814-grep-recurse-submodules.sh | 49 ++ 1 file changed, 49 insertions(+) diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh index 1472855e1d..3a58197f47 100755 --- a/t/t7814-grep-recurse-submodules.sh +++ b/t/t7814-grep-recurse-submodules.sh @@ -313,4 +313,53 @@ test_incompatible_with_recurse_submodules () test_incompatible_with_recurse_submodules --untracked test_incompatible_with_recurse_submodules --no-index +test_expect_success 'grep --recurse-submodules should pass the pattern type along' ' + # Fixed + test_must_fail git grep -F --recurse-submodules -e "(.|.)[\d]" && + test_must_fail git -c grep.patternType=fixed grep --recurse-submodules -e "(.|.)[\d]" && + + # Basic + git grep -G --recurse-submodules -e "(.|.)[\d]" >actual && + cat >expect <<-\EOF && + a:(1|2)d(3|4) + submodule/a:(1|2)d(3|4) + submodule/sub/a:(1|2)d(3|4) + EOF + test_cmp expect actual && + git -c grep.patternType=basic grep --recurse-submodules -e "(.|.)[\d]" >actual && + test_cmp expect actual && + + # Extended + git grep -E --recurse-submodules -e "(.|.)[\d]" >actual && + cat >expect <<-\EOF && + .gitmodules:[submodule "submodule"] + .gitmodules:path = submodule + .gitmodules:url = ./submodule + a:(1|2)d(3|4) + submodule/.gitmodules:[submodule "sub"] + submodule/a:(1|2)d(3|4) + submodule/sub/a:(1|2)d(3|4) + EOF + test_cmp expect actual && + git -c grep.patternType=extended grep --recurse-submodules -e "(.|.)[\d]" >actual && + test_cmp expect actual && + git -c grep.extendedRegexp=true grep --recurse-submodules -e "(.|.)[\d]" >actual && + test_cmp expect actual && + + # Perl + if test_have_prereq PCRE + then + git grep -P --recurse-submodules -e "(.|.)[\d]" >actual && + cat >expect <<-\EOF && + a:(1|2)d(3|4) + b/b:(3|4) + submodule/a:(1|2)d(3|4) + submodule/sub/a:(1|2)d(3|4) + EOF + test_cmp expect actual && + git -c grep.patternType=perl grep --recurse-submodules -e "(.|.)[\d]" >actual && + test_cmp expect actual + fi +' + test_done -- 2.11.0
[PATCH v2 12/29] grep: prepare for testing binary regexes containing rx metacharacters
Add setup code needed for testing regexes that contain both binary data and regex metacharacters. The POSIX regcomp() function inherently can't support that, because it takes a \0-delimited char *, but other regex engines APIs like PCRE v2 take a pattern/length pair, and are thus able to handle \0s in patterns as well as any other character. When kwset was imported in commit 9eceddeec6 ("Use kwset in grep", 2011-08-21) this limitation was fixed, but at the expense of introducing the undocumented limitation that any pattern containing \0 implicitly becomes a fixed match (equivalent to -F having been provided). That's not something we'd like to keep in the future. The inability to match patterns containing \0 is a leaky implementation detail. So add tests as a first step towards changing that. In order to test that \0-patterns can properly match as regexes the test string needs to have some regex metacharacters in it. There were other blind spots in the tests. The code around kwset specially handles case-insensitive & non-ASCII data, but there were no tests for this. Fix all of that by amending the text being matched to contain both regex metacharacters & non-ASCII data. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t7008-grep-binary.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh index 70e7868829..e7754c3946 100755 --- a/t/t7008-grep-binary.sh +++ b/t/t7008-grep-binary.sh @@ -28,7 +28,7 @@ nul_match () { } test_expect_success 'setup' " - echo 'binaryQfile' | q_to_nul >a && + echo 'binaryQfileQm[*]cQ*æQð' | q_to_nul >a && git add a && git commit -m. " @@ -162,7 +162,7 @@ test_expect_success 'grep does not honor textconv' ' ' test_expect_success 'grep --textconv honors textconv' ' - echo "a:binaryQfile" >expect && + echo "a:binaryQfileQm[*]cQ*æQð" >expect && git grep --textconv Qfile >actual && test_cmp expect actual ' @@ -172,7 +172,7 @@ test_expect_success 'grep --no-textconv does not honor textconv' ' ' test_expect_success 'grep --textconv blob honors textconv' ' - echo "HEAD:a:binaryQfile" >expect && + echo "HEAD:a:binaryQfileQm[*]cQ*æQð" >expect && git grep --textconv Qfile HEAD:a >actual && test_cmp expect actual ' -- 2.11.0
[PATCH v2 11/29] grep: add a test helper function for less verbose -f \0 tests
Add a helper function to make the tests which check for patterns with \0 in them more succinct. Right now this isn't a big win, but subsequent commits will add a lot more of these tests. The helper is based on the match() function in t3070-wildmatch.sh. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t7008-grep-binary.sh | 58 +- 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh index 9c9c378119..70e7868829 100755 --- a/t/t7008-grep-binary.sh +++ b/t/t7008-grep-binary.sh @@ -4,6 +4,29 @@ test_description='git grep in binary files' . ./test-lib.sh +nul_match () { + status=$1 + flags=$2 + pattern=$3 + pattern_human=$(echo "$pattern" | sed 's/Q//g') + + if test "$status" = 1 + then + test_expect_success "git grep -f f $flags '$pattern_human' a" " + printf '$pattern' | q_to_nul >f && + git grep -f f $flags a + " + elif test "$status" = 0 + then + test_expect_success "git grep -f f $flags '$pattern_human' a" " + printf '$pattern' | q_to_nul >f && + test_must_fail git grep -f f $flags a + " + else + test_expect_success "PANIC: Test framework error. Unknown status $status" 'false' + fi +} + test_expect_success 'setup' " echo 'binaryQfile' | q_to_nul >a && git add a && @@ -69,35 +92,12 @@ test_expect_failure 'git grep .fi a' ' git grep .fi a ' -test_expect_success 'git grep -F yf a' " - printf 'yQf' | q_to_nul >f && - git grep -f f -F a -" - -test_expect_success 'git grep -F yx a' " - printf 'yQx' | q_to_nul >f && - test_must_fail git grep -f f -F a -" - -test_expect_success 'git grep -Fi Yf a' " - printf 'YQf' | q_to_nul >f && - git grep -f f -Fi a -" - -test_expect_success 'git grep -Fi Yx a' " - printf 'YQx' | q_to_nul >f && - test_must_fail git grep -f f -Fi a -" - -test_expect_success 'git grep yf a' " - printf 'yQf' | q_to_nul >f && - git grep -f f a -" - -test_expect_success 'git grep yx a' " - printf 'yQx' | q_to_nul >f && - test_must_fail git grep -f f a -" +nul_match 1 '-F' 'yQf' +nul_match 0 '-F' 'yQx' +nul_match 1 '-Fi' 'YQf' +nul_match 0 '-Fi' 'YQx' +nul_match 1 '' 'yQf' +nul_match 0 '' 'yQx' test_expect_success 'grep respects binary diff attribute' ' echo text >t && -- 2.11.0
[PATCH v2 03/29] test-lib: rename the LIBPCRE prerequisite to PCRE
Rename the LIBPCRE prerequisite to PCRE. This is for preparation for libpcre2 support, where having just "LIBPCRE" would be confusing as it implies v1 of the library. None of these tests are incompatible between versions 1 & 2 of libpcre, it's less confusing to give them a more general name to make it clear that they work on both library versions. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/README| 4 ++-- t/t7810-grep.sh | 28 ++-- t/t7812-grep-icase-non-ascii.sh | 4 ++-- t/t7813-grep-icase-iso.sh | 2 +- t/test-lib.sh | 2 +- 5 files changed, 20 insertions(+), 20 deletions(-) diff --git a/t/README b/t/README index ab386c3681..a90cb62583 100644 --- a/t/README +++ b/t/README @@ -803,9 +803,9 @@ use these, and "test_set_prereq" for how to define your own. Test is not run by root user, and an attempt to write to an unwritable file is expected to fail correctly. - - LIBPCRE + - PCRE - Git was compiled with USE_LIBPCRE=YesPlease. Wrap any tests + Git was compiled with support for PCRE. Wrap any tests that use git-grep --perl-regexp or git-grep -P in these. - CASE_INSENSITIVE_FS diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index cee42097b0..c84c4d99f9 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -275,7 +275,7 @@ do test_cmp expected actual ' - test_expect_success LIBPCRE "grep $L with grep.patterntype=perl" ' + test_expect_success PCRE "grep $L with grep.patterntype=perl" ' echo "${HC}ab:a+b*c" >expected && git -c grep.patterntype=perl grep "a\x{2b}b\x{2a}c" $H ab >actual && test_cmp expected actual @@ -1053,12 +1053,12 @@ hello.c:int main(int argc, const char **argv) hello.c: printf("Hello world.\n"); EOF -test_expect_success LIBPCRE 'grep --perl-regexp pattern' ' +test_expect_success PCRE 'grep --perl-regexp pattern' ' git grep --perl-regexp "\p{Ps}.*?\p{Pe}" hello.c >actual && test_cmp expected actual ' -test_expect_success LIBPCRE 'grep -P pattern' ' +test_expect_success PCRE 'grep -P pattern' ' git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual && test_cmp expected actual ' @@ -1070,13 +1070,13 @@ test_expect_success 'grep pattern with grep.extendedRegexp=true' ' test_cmp empty actual ' -test_expect_success LIBPCRE 'grep -P pattern with grep.extendedRegexp=true' ' +test_expect_success PCRE 'grep -P pattern with grep.extendedRegexp=true' ' git -c grep.extendedregexp=true \ grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual && test_cmp expected actual ' -test_expect_success LIBPCRE 'grep -P -v pattern' ' +test_expect_success PCRE 'grep -P -v pattern' ' { echo "ab:a+b*c" echo "ab:a+bc" @@ -1085,7 +1085,7 @@ test_expect_success LIBPCRE 'grep -P -v pattern' ' test_cmp expected actual ' -test_expect_success LIBPCRE 'grep -P -i pattern' ' +test_expect_success PCRE 'grep -P -i pattern' ' cat >expected <<-EOF && hello.c:printf("Hello world.\n"); EOF @@ -1093,7 +1093,7 @@ test_expect_success LIBPCRE 'grep -P -i pattern' ' test_cmp expected actual ' -test_expect_success LIBPCRE 'grep -P -w pattern' ' +test_expect_success PCRE 'grep -P -w pattern' ' { echo "hello_world:Hello world" echo "hello_world:HeLLo world" @@ -1118,11 +1118,11 @@ test_expect_success 'grep invalidpattern properly dies with grep.patternType=ext test_must_fail git -c grep.patterntype=extended grep "a[" ' -test_expect_success LIBPCRE 'grep -P invalidpattern properly dies ' ' +test_expect_success PCRE 'grep -P invalidpattern properly dies ' ' test_must_fail git grep -P "a[" ' -test_expect_success LIBPCRE 'grep invalidpattern properly dies with grep.patternType=perl' ' +test_expect_success PCRE 'grep invalidpattern properly dies with grep.patternType=perl' ' test_must_fail git -c grep.patterntype=perl grep "a[" ' @@ -1191,13 +1191,13 @@ test_expect_success 'grep pattern with grep.patternType=fixed, =basic, =perl, =e test_cmp expected actual ' -test_expect_success LIBPCRE 'grep -G -F -E -P pattern' ' +test_expect_success PCRE 'grep -G -F -E -P pattern' ' echo "d0:0" >expected && git grep -G -F -E -P "[\d]" d0 >actual && test_cmp expected actual ' -test_expect_success LIBPCRE 'grep pattern with grep.patternType=fixed, =basic, =extended, =perl' ' +test_expect_success PCRE 'grep pattern with grep.patternType=fixed, =basic, =extended, =perl' ' echo "d0:0" >expected && git \ -c grep.patterntype=fixed \ @@ -1208,7 +1208,7 @@ test_expect_success LIBPCRE 'grep pattern with grep.patternType=fixed, =basic, = test_cmp expected actual ' -test_expect_success LIBPCRE 'grep -P pattern with
[PATCH v2 05/29] grep: add a test asserting that --perl-regexp dies when !PCRE
Add a test asserting that when --perl-regexp (and -P for grep) is given to git-grep & git-log that we die with an error. In developing the PCRE v2 series I introduced a regression where -P would (through control-flow fall-through) become synonymous with basic POSIX matching. I.e. 'git grep -P '[\d]' would match "d" instead of digits. The entire test suite would still pass with this serious regression, since everything that tested for --perl-regexp would be guarded by the PCRE prerequisite, fix that blind-spot by adding tests under !PCRE asserting that git must die when given --perl-regexp or -P. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t4202-log.sh | 4 +++- t/t7810-grep.sh | 12 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/t/t4202-log.sh b/t/t4202-log.sh index e522a2fcd5..9680dfe400 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -364,7 +364,9 @@ test_expect_success 'log with various grep.patternType configurations & command- git log --pretty=tformat:%s --perl-regexp \ --grep="[\d]\|" >actual.perl.long-arg && test_cmp expect.perl actual.perl.long-arg - + else + test_must_fail git log --perl-regexp \ + --grep="[\d]\|" fi && test_cmp expect.fixed actual.fixed.long-arg && test_cmp expect.basic actual.basic.long-arg && diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index c84c4d99f9..8d69113695 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -281,6 +281,10 @@ do test_cmp expected actual ' + test_expect_success !PCRE "grep $L with grep.patterntype=perl errors without PCRE" ' + test_must_fail git -c grep.patterntype=perl grep "foo.*bar" + ' + test_expect_success "grep $L with grep.patternType=default and grep.extendedRegexp=true" ' echo "${HC}ab:abc" >expected && git \ @@ -1058,11 +1062,19 @@ test_expect_success PCRE 'grep --perl-regexp pattern' ' test_cmp expected actual ' +test_expect_success !PCRE 'grep --perl-regexp pattern errors without PCRE' ' + test_must_fail git grep --perl-regexp "foo.*bar" +' + test_expect_success PCRE 'grep -P pattern' ' git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual && test_cmp expected actual ' +test_expect_success !PCRE 'grep -P pattern errors without PCRE' ' + test_must_fail git grep -P "foo.*bar" +' + test_expect_success 'grep pattern with grep.extendedRegexp=true' ' >empty && test_must_fail git -c grep.extendedregexp=true \ -- 2.11.0
[PATCH v2 09/29] grep: amend submodule recursion test for regex engine testing
Amend the submodule recursion test to prepare it for subsequent tests of whether it passes along the grep.patternType to the submodule greps. This is the result of searching & replacing: foobar -> (1|2)d(3|4) foo-> (1|2) bar-> (3|4) Currently there's no tests for whether e.g. -P or -E is correctly passed along, tests for that will be added in a follow-up change, but first add content to the tests which will match differently under different regex engines. Reuse the pattern established in an earlier commit of mine in this series ("log: add exhaustive tests for pattern style options & config", 2017-04-07). The pattern "(.|.)[\d]" will match this content differently under fixed/basic/extended & perl. This test code was originally added in commit 0281e487fd ("grep: optionally recurse into submodules", 2016-12-16). Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t7814-grep-recurse-submodules.sh | 166 ++--- 1 file changed, 83 insertions(+), 83 deletions(-) diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh index 5b6eb3a65e..1472855e1d 100755 --- a/t/t7814-grep-recurse-submodules.sh +++ b/t/t7814-grep-recurse-submodules.sh @@ -9,13 +9,13 @@ submodules. . ./test-lib.sh test_expect_success 'setup directory structure and submodule' ' - echo "foobar" >a && + echo "(1|2)d(3|4)" >a && mkdir b && - echo "bar" >b/b && + echo "(3|4)" >b/b && git add a b && git commit -m "add a and b" && git init submodule && - echo "foobar" >submodule/a && + echo "(1|2)d(3|4)" >submodule/a && git -C submodule add a && git -C submodule commit -m "add a" && git submodule add ./submodule && @@ -24,18 +24,18 @@ test_expect_success 'setup directory structure and submodule' ' test_expect_success 'grep correctly finds patterns in a submodule' ' cat >expect <<-\EOF && - a:foobar - b/b:bar - submodule/a:foobar + a:(1|2)d(3|4) + b/b:(3|4) + submodule/a:(1|2)d(3|4) EOF - git grep -e "bar" --recurse-submodules >actual && + git grep -e "(3|4)" --recurse-submodules >actual && test_cmp expect actual ' test_expect_success 'grep and basic pathspecs' ' cat >expect <<-\EOF && - submodule/a:foobar + submodule/a:(1|2)d(3|4) EOF git grep -e. --recurse-submodules -- submodule >actual && @@ -44,7 +44,7 @@ test_expect_success 'grep and basic pathspecs' ' test_expect_success 'grep and nested submodules' ' git init submodule/sub && - echo "foobar" >submodule/sub/a && + echo "(1|2)d(3|4)" >submodule/sub/a && git -C submodule/sub add a && git -C submodule/sub commit -m "add a" && git -C submodule submodule add ./sub && @@ -54,117 +54,117 @@ test_expect_success 'grep and nested submodules' ' git commit -m "updated submodule" && cat >expect <<-\EOF && - a:foobar - b/b:bar - submodule/a:foobar - submodule/sub/a:foobar + a:(1|2)d(3|4) + b/b:(3|4) + submodule/a:(1|2)d(3|4) + submodule/sub/a:(1|2)d(3|4) EOF - git grep -e "bar" --recurse-submodules >actual && + git grep -e "(3|4)" --recurse-submodules >actual && test_cmp expect actual ' test_expect_success 'grep and multiple patterns' ' cat >expect <<-\EOF && - a:foobar - submodule/a:foobar - submodule/sub/a:foobar + a:(1|2)d(3|4) + submodule/a:(1|2)d(3|4) + submodule/sub/a:(1|2)d(3|4) EOF - git grep -e "bar" --and -e "foo" --recurse-submodules >actual && + git grep -e "(3|4)" --and -e "(1|2)" --recurse-submodules >actual && test_cmp expect actual ' test_expect_success 'grep and multiple patterns' ' cat >expect <<-\EOF && - b/b:bar + b/b:(3|4) EOF - git grep -e "bar" --and --not -e "foo" --recurse-submodules >actual && + git grep -e "(3|4)" --and --not -e "(1|2)" --recurse-submodules >actual && test_cmp expect actual ' test_expect_success 'basic grep tree' ' cat >expect <<-\EOF && - HEAD:a:foobar - HEAD:b/b:bar - HEAD:submodule/a:foobar - HEAD:submodule/sub/a:foobar + HEAD:a:(1|2)d(3|4) + HEAD:b/b:(3|4) + HEAD:submodule/a:(1|2)d(3|4) + HEAD:submodule/sub/a:(1|2)d(3|4) EOF - git grep -e "bar" --recurse-submodules HEAD >actual && + git grep -e "(3|4)" --recurse-submodules HEAD >actual && test_cmp expect actual ' test_expect_success 'grep tree HEAD^' ' cat >expect <<-\EOF && - HEAD^:a:foobar - HEAD^:b/b:bar - HEAD^:submodule/a:foobar + HEAD^:a:(1|2)d(3|4) + HEAD^:b/b:(3|4) + HEAD^:submodule/a:(1|2)d(3|4) EOF - git grep -e "bar" --recurse-submodules HEAD^ >actual && + git grep
[PATCH v2 02/29] grep & rev-list doc: stop promising libpcre for --perl-regexp
Stop promising in our grep & rev-list options documentation that we're always going to be using libpcre when given the --perl-regexp option. Instead talk about using "Perl-compatible regular expressions" and using these types of patterns using "a compile-time dependency". Saying "libpcre" means that we're talking about libpcre.so, which is always going to be v1. This change is part of an ongoing saga to add support for libpcre2, which comes with PCRE v2. In the future we might use some completely unrelated library to provide perl-compatible regular expression support. By wording the documentation differently and not promising any specific version of PCRE or even PCRE at all we have more wiggle room to change the implementation. Signed-off-by: Ævar Arnfjörð Bjarmason--- Documentation/git-grep.txt | 7 +-- Documentation/rev-list-options.txt | 8 ++-- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 71f32f3508..5033483db4 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -161,8 +161,11 @@ OPTIONS -P:: --perl-regexp:: - Use Perl-compatible regexp for patterns. Requires libpcre to be - compiled in. + Use Perl-compatible regular expressions for patterns. ++ +Support for these types of regular expressions is an optional +compile-time dependency. If Git wasn't compiled with support for them +providing this option will cause it to die. -F:: --fixed-strings:: diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index a02f7324c0..a46f70c2b1 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -92,8 +92,12 @@ endif::git-rev-list[] pattern as a regular expression). --perl-regexp:: - Consider the limiting patterns to be Perl-compatible regular expressions. - Requires libpcre to be compiled in. + Consider the limiting patterns to be Perl-compatible regular + expressions. ++ +Support for these types of regular expressions is an optional +compile-time dependency. If Git wasn't compiled with support for them +providing this option will cause it to die. --remove-empty:: Stop when a given path disappears from the tree. -- 2.11.0
[PATCH v2 07/29] grep: change non-ASCII -i test to stop using --debug
Change a non-ASCII case-insensitive test case to stop using --debug, and instead simply test for the expected results. The test coverage remains the same with this change, but the test won't break due to internal refactoring. This test was added in commit 793dc676e0 ("grep/icase: avoid kwsset when -F is specified", 2016-06-25). It was asserting that the regex must be compiled with compile_fixed_regexp(), instead test for the expected results, allowing the underlying implementation to change. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t7812-grep-icase-non-ascii.sh | 25 + 1 file changed, 5 insertions(+), 20 deletions(-) diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh index 04a61cb8e0..0059a1f837 100755 --- a/t/t7812-grep-icase-non-ascii.sh +++ b/t/t7812-grep-icase-non-ascii.sh @@ -36,29 +36,14 @@ test_expect_success GETTEXT_LOCALE,PCRE 'grep pcre utf-8 string with "+"' ' ' test_expect_success REGEX_LOCALE 'grep literal string, with -F' ' - git grep --debug -i -F "TILRAUN: Halló Heimur!" 2>&1 >/dev/null | -grep fixed >debug1 && - test_write_lines "fixed TILRAUN: Halló Heimur!" >expect1 && - test_cmp expect1 debug1 && - - git grep --debug -i -F "TILRAUN: HALLÓ HEIMUR!" 2>&1 >/dev/null | -grep fixed >debug2 && - test_write_lines "fixed TILRAUN: HALLÓ HEIMUR!" >expect2 && - test_cmp expect2 debug2 + git grep -i -F "TILRAUN: Halló Heimur!" && + git grep -i -F "TILRAUN: HALLÓ HEIMUR!" ' test_expect_success REGEX_LOCALE 'grep string with regex, with -F' ' - test_write_lines "^*TILR^AUN:.* \\Halló \$He[]imur!\$" >file && - - git grep --debug -i -F "^*TILR^AUN:.* \\Halló \$He[]imur!\$" 2>&1 >/dev/null | -grep fixed >debug1 && - test_write_lines "fixed \\^*TILR^AUN:\\.\\* Halló \$He\\[]imur!\\\$" >expect1 && - test_cmp expect1 debug1 && - - git grep --debug -i -F "^*TILR^AUN:.* \\HALLÓ \$HE[]IMUR!\$" 2>&1 >/dev/null | -grep fixed >debug2 && - test_write_lines "fixed \\^*TILR^AUN:\\.\\* HALLÓ \$HE\\[]IMUR!\\\$" >expect2 && - test_cmp expect2 debug2 + test_write_lines "TILRAUN: Halló Heimur [abc]!" >file3 && + git add file3 && + git grep -i -F "TILRAUN: Halló Heimur [abc]!" file3 ' test_expect_success REGEX_LOCALE 'pickaxe -i on non-ascii' ' -- 2.11.0
[PATCH v2 06/29] grep: add a test for backreferences in PCRE patterns
Add a test for backreferences such as (.)\1 in PCRE patterns. This test ensures that the PCRE_NO_AUTO_CAPTURE option isn't turned on. Before this change turning it on would break these sort of patterns, but wouldn't break any tests. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t7810-grep.sh | 7 +++ 1 file changed, 7 insertions(+) diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 8d69113695..daa906b9b0 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -1114,6 +1114,13 @@ test_expect_success PCRE 'grep -P -w pattern' ' test_cmp expected actual ' +test_expect_success PCRE 'grep -P backreferences work (the PCRE NO_AUTO_CAPTURE flag is not set)' ' + git grep -P -h "(?P.)(?P=one)" hello_world >actual && + test_cmp hello_world actual && + git grep -P -h "(.)\1" hello_world >actual && + test_cmp hello_world actual +' + test_expect_success 'grep -G invalidpattern properly dies ' ' test_must_fail git grep -G "a[" ' -- 2.11.0
[PATCH v2 08/29] grep: add tests for --threads=N and grep.threads
Add tests for --threads=N being supplied on the command-line, or when grep.threads=N being supplied in the configuration. When the threading support was made run-time configurable in commit 89f09dd34e ("grep: add --threads= option and grep.threads configuration", 2015-12-15) no tests were added for it. In developing a change to the grep code I was able to make '--threads=1 ` segfault, while the test suite still passed. This change fixes that blind spot in the tests. In addition to asserting that asking for N threads shouldn't segfault, test that the grep output given any N is the same. The choice to test only 1..10 as opposed to 1..8 or 1..16 or whatever is arbitrary. Testing 1..1024 works locally for me (but gets noticeably slower as more threads are spawned). Given the structure of the code there's no reason to test an arbitrary number of threads, only 0, 1 and >=2 are special modes of operation. A later patch introduces a PTHREADS test prerequisite which is true under NO_PTHREADS=UnfortunatelyYes, but even under NO_PTHREADS it's fine to test --threads=N, we'll just ignore it and not use threading. So these tests also make sense under that mode to assert that --threads=N without pthreads still returns expected results. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t7810-grep.sh | 16 1 file changed, 16 insertions(+) diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index daa906b9b0..561709ef6a 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -775,6 +775,22 @@ test_expect_success 'grep -W with userdiff' ' test_cmp expected actual ' +for threads in $(test_seq 0 10) +do + test_expect_success "grep --threads=$threads & -c grep.threads=$threads" " + git grep --threads=$threads . >actual.$threads && + if test $threads -ge 1 + then + test_cmp actual.\$(($threads - 1)) actual.$threads + fi && + git -c grep.threads=$threads grep . >actual.$threads && + if test $threads -ge 1 + then + test_cmp actual.\$(($threads - 1)) actual.$threads + fi + " +done + test_expect_success 'grep from a subdirectory to search wider area (1)' ' mkdir -p s && ( -- 2.11.0
[PATCH v2 01/29] Makefile & configure: reword inaccurate comment about PCRE
Reword an outdated & inaccurate comment which suggests that only git-grep can use PCRE. This comment was added back when PCRE support was initially added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09), and was true at the time. It hasn't been telling the full truth since git-log learned to use PCRE with --grep in commit 727b6fc3ed ("log --grep: accept --basic-regexp and --perl-regexp", 2012-10-03), and more importantly is likely to get more inaccurate over time as more use is made of PCRE in other areas. Reword it to be more future-proof, and to more clearly explain that this enables user-initiated runtime behavior. Copy/pasting this so much in configure.ac is lame, these Makefile-like flags aren't even used by autoconf, just the corresponding --with[out]-* options. But copy/pasting the comments that make sense for the Makefile to configure.ac where they make less sense is the pattern everything else follows in that file. I'm not going to war against that as part of this change, just following the existing pattern. Signed-off-by: Ævar Arnfjörð Bjarmason--- Makefile | 6 -- configure.ac | 12 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index e35542e631..eedadb8056 100644 --- a/Makefile +++ b/Makefile @@ -24,8 +24,10 @@ all:: # Define NO_OPENSSL environment variable if you do not have OpenSSL. # This also implies BLK_SHA1. # -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be -# able to use Perl-compatible regular expressions. +# Define USE_LIBPCRE if you have and want to use libpcre. Various +# commands such as log and grep offer runtime options to use +# Perl-compatible regular expressions instead of standard or extended +# POSIX regular expressions. # # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in # /foo/bar/include and /foo/bar/lib directories. diff --git a/configure.ac b/configure.ac index 128165529f..deeb968daa 100644 --- a/configure.ac +++ b/configure.ac @@ -250,8 +250,10 @@ AS_HELP_STRING([--with-openssl],[use OpenSSL library (default is YES)]) AS_HELP_STRING([], [ARG can be prefix for openssl library and headers]), GIT_PARSE_WITH([openssl])) -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be -# able to use Perl-compatible regular expressions. +# Define USE_LIBPCRE if you have and want to use libpcre. Various +# commands such as log and grep offer runtime options to use +# Perl-compatible regular expressions instead of standard or extended +# POSIX regular expressions. # # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in # /foo/bar/include and /foo/bar/lib directories. @@ -499,8 +501,10 @@ GIT_CONF_SUBST([NEEDS_SSL_WITH_CRYPTO]) GIT_CONF_SUBST([NO_OPENSSL]) # -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be -# able to use Perl-compatible regular expressions. +# Define USE_LIBPCRE if you have and want to use libpcre. Various +# commands such as log and grep offer runtime options to use +# Perl-compatible regular expressions instead of standard or extended +# POSIX regular expressions. # if test -n "$USE_LIBPCRE"; then -- 2.11.0
[PATCH v2 04/29] log: add exhaustive tests for pattern style options & config
Add exhaustive tests for how the different grep.patternType options & the corresponding command-line options affect git-log. Before this change it was possible to patch revision.c so that the --basic-regexp option was synonymous with --extended-regexp, and --perl-regexp wasn't recognized at all, and still have 100% of the test suite pass. This was because the first test being modified here, added in commit 34a4ae55b2 ("log --grep: use the same helper to set -E/-F options as "git grep"", 2012-10-03), didn't actually check whether we'd enabled extended regular expressions as distinct from re-toggling non-fixed string support. Fix that by changing the pattern to a pattern that'll only match if --extended-regexp option is provided, but won't match under the default --basic-regexp option. Other potential regressions were possible since there were no tests for the rest of the combinations of grep.patternType configuration toggles & corresponding git-log command-line options. Add exhaustive tests for those. The patterns being passed to fixed/basic/extended/PCRE are carefully crafted to return the wrong thing if the grep engine were to pick any other matching method than the one it's told to use. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t4202-log.sh | 94 +- 1 file changed, 93 insertions(+), 1 deletion(-) diff --git a/t/t4202-log.sh b/t/t4202-log.sh index f577990716..e522a2fcd5 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -262,7 +262,28 @@ test_expect_success 'log --grep -i' ' test_expect_success 'log -F -E --grep= uses ere' ' echo second >expect && - git log -1 --pretty="tformat:%s" -F -E --grep=s.c.nd >actual && + # basic would need \(s\) to do the same + git log -1 --pretty="tformat:%s" -F -E --grep="(s).c.nd" >actual && + test_cmp expect actual +' + +test_expect_success PCRE 'log -F -E --perl-regexp --grep= uses PCRE' ' + test_when_finished "rm -rf num_commits" && + git init num_commits && + ( + cd num_commits && + test_commit 1d && + test_commit 2e + ) && + echo 2e >expect && + # In PCRE \d in [\d] is like saying "0-9", and matches the 2 + # in 2e... + git -C num_commits log -1 --pretty="tformat:%s" -F -E --perl-regexp --grep="[\d]" >actual && + test_cmp expect actual && + echo 1d >expect && + # ...in POSIX basic & extended it is the same as [d], + # i.e. "d", which matches 1d, but not and does not match 2e. + git -C num_commits log -1 --pretty="tformat:%s" -F -E --grep="[\d]" >actual && test_cmp expect actual ' @@ -280,6 +301,77 @@ test_expect_success 'log with grep.patternType configuration and command line' ' test_cmp expect actual ' +test_expect_success 'log with various grep.patternType configurations & command-lines' ' + git init pattern-type && + ( + cd pattern-type && + test_commit 1 file A && + + # The tagname is overridden here because creating a + # tag called "(1|2)" as test_commit would otherwise + # implicitly do would fail on e.g. MINGW. + test_commit "(1|2)" file B 2 && + + echo "(1|2)" >expect.fixed && + cp expect.fixed expect.basic && + cp expect.fixed expect.extended && + cp expect.fixed expect.perl && + + # A strcmp-like match with fixed. + git -c grep.patternType=fixed log --pretty=tformat:%s \ + --grep="(1|2)" >actual.fixed && + + # POSIX basic matches (, | and ) literally. + git -c grep.patternType=basic log --pretty=tformat:%s \ + --grep="(.|.)" >actual.basic && + + # POSIX extended needs to have | escaped to match it + # literally, whereas under basic this is the same as + # (|2), i.e. it would also match "1". This test checks + # for extended by asserting that it is not matching + # what basic would match. + git -c grep.patternType=extended log --pretty=tformat:%s \ + --grep="\|2" >actual.extended && + if test_have_prereq PCRE + then + # Only PCRE would match [\d]\| with only + # "(1|2)" due to [\d]. POSIX basic would match + # both it and "1", and POSIX extended would + # match neither. + git -c grep.patternType=perl log --pretty=tformat:%s \ + --grep="[\d]\|" >actual.perl && + test_cmp expect.perl actual.perl + fi && + test_cmp expect.fixed actual.fixed && + test_cmp expect.basic actual.basic && + test_cmp
[PATCH v2 00/29] Easy to review grep & pre-PCRE changes
Easy to review? 29 patches? Are you kidding me?! As noted in v1 (<20170511091829.5634-1-ava...@gmail.com>; https://public-inbox.org/git/20170511091829.5634-1-ava...@gmail.com/) these are all doc, test, refactoring etc. changes needed by the subsequent "PCRE v2, PCRE v1 JIT, log -P & fixes" series. Thanks a lot for the review everyone. This fixes all the issues raised. Changes noted below, with names prefixed by the person who raised the issue. Ævar Arnfjörð Bjarmason (29): Makefile & configure: reword inaccurate comment about PCRE grep & rev-list doc: stop promising libpcre for --perl-regexp test-lib: rename the LIBPCRE prerequisite to PCRE No changes. log: add exhaustive tests for pattern style options & config Johannes: Now doesn't create a "(1|2)" tag, so should work on Windows & beyond (wasn't needed, just created as a side-effect of test_commit) Junio: Added comments for tricky basic/extended/perl Junio: Moved all the 'test_have_prereq PCRE' test / test_cmp code together, not apart as before. grep: add a test asserting that --perl-regexp dies when !PCRE grep: add a test for backreferences in PCRE patterns No changes. grep: change non-ASCII -i test to stop using --debug Brandon: Removed stray leftover unused --debug grep: add tests for --threads=N and grep.threads Brandon: Amended commit message to clarify that this test doesn't need a NO_PTHREADS prerequisite, and we actually get coverage out of testing with --threads=N when not with threads, or at least it doesn't harm anything. grep: amend submodule recursion test for regex engine testing Junio: Now "foo" -> "(1|2)" as the commit message claims, not -> "(1|2)d". grep: add tests for grep pattern types being passed to submodules No changes. grep: add a test helper function for less verbose -f \0 tests Junio: "nul_match() {" -> "nul_match () {" & quote "$status" & don't quote non-variable strings. grep: prepare for testing binary regexes containing rx metacharacters No changes. grep: add tests to fix blind spots with \0 patterns Junio: Also fixed quoted variable strings here as above. perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do perf: emit progress output when unpacking & building No changes. perf: add a performance comparison test of grep -G, -E and -P All my multibyte performance tests were done with the string 'm(ú|u)ult.b(æ|y)te' which didn't match anything in the kernel, now done with 'm(ú|u)lt.b(æ|y)te' instead. I re-ran all the performance tests mentioned in the commit messages where applicable. perf: add a performance comparison of fixed-string grep One test_cmp was run twice due to rebasing from the pcre1/pcre2 days of this series. Fixed. grep: catch a missing enum in switch statement Stefan: Removed the comment about die(..BUG) & put the relevant detail in the commit message instead. grep: remove redundant regflags assignment under PCRE grep: remove redundant `regflags &= ~REG_EXTENDED` assignments No changes. grep: factor test for \0 in grep patterns into a function Brandon: Fix comment syntax creating the function, and move it to the correct place now instead of later in the "move is_fixed()" commit. grep: change the internal PCRE macro names to be PCRE1 grep: change internal *pcre* variable & function names to be *pcre1* No changes. grep: move is_fixed() earlier to avoid forward declaration Brandon: Now just moves is_fixed() instead of is_fixed() & has_null() test-lib: add a PTHREADS prerequisite pack-objects & index-pack: add test for --threads warning pack-objects: fix buggy warning about threads No changes. grep: given --threads with NO_PTHREADS=YesPlease, warn Use Git standard comment syntax for TRANSLATORS comment. grep: assert that threading is enabled when calling grep_{lock,unlock} Documentation/git-grep.txt | 7 +- Documentation/rev-list-options.txt | 8 +- Makefile | 14 ++- builtin/grep.c | 23 +++- builtin/pack-objects.c | 4 +- configure.ac | 12 ++- grep.c | 108 ++- grep.h | 10 +- t/README | 8 +- t/perf/README | 19 +++- t/perf/p7820-grep-engines.sh | 35 ++ t/perf/p7821-grep-engines-fixed.sh | 26 + t/perf/run | 13 ++- t/t4202-log.sh | 96 - t/t5300-pack-object.sh | 33 ++ t/t7008-grep-binary.sh | 135 +-- t/t7810-grep.sh| 81 +++--- t/t7812-grep-icase-non-ascii.sh| 29 ++--- t/t7813-grep-icase-iso.sh | 2 +- t/t7814-grep-recurse-submodules.sh | 215 +++-- t/test-lib.sh | 3 +- 21 files changed, 646 insertions(+), 235 deletions(-) create mode 100755
Re: [PATCH] Use https links to Wikipedia to avoid http redirects
On Sat, May 13, 2017 at 11:54 AM, Sven Strickrothwrote: > Signed-off-by: Sven Strickroth Thanks! FWIW: Reviewed-by: Ævar Arnfjörð Bjarmason > --- > Documentation/gitweb.txt | 2 +- > bisect.c | 2 +- > gitweb/gitweb.perl | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/Documentation/gitweb.txt b/Documentation/gitweb.txt > index 96156e5e1..88450589a 100644 > --- a/Documentation/gitweb.txt > +++ b/Documentation/gitweb.txt > @@ -84,7 +84,7 @@ separator (rules for Perl's "`split(" ", $line)`"). > > * Fields use modified URI encoding, defined in RFC 3986, section 2.1 > (Percent-Encoding), or rather "Query string encoding" (see > -http://en.wikipedia.org/wiki/Query_string#URL_encoding[]), the difference > +https://en.wikipedia.org/wiki/Query_string#URL_encoding[]), the difference > being that SP (" ") can be encoded as "{plus}" (and therefore "{plus}" has > to be > also percent-encoded). > + > diff --git a/bisect.c b/bisect.c > index 08c9fb726..fa477a3e2 100644 > --- a/bisect.c > +++ b/bisect.c > @@ -546,7 +546,7 @@ static unsigned get_prn(unsigned count) { > > /* > * Custom integer square root from > - * http://en.wikipedia.org/wiki/Integer_square_root > + * https://en.wikipedia.org/wiki/Integer_square_root > */ > static int sqrti(int val) > { > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 7cf68f07b..d8209c7a0 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -8085,7 +8085,7 @@ sub git_search_help { > Pattern is by default a normal string that is matched > precisely (but without > regard to case, except in the case of pickaxe). However, when you check the > re checkbox, > the pattern entered is recognized as the POSIX extended > -http://en.wikipedia.org/wiki/Regular_expression;>regular > expression (also case > +https://en.wikipedia.org/wiki/Regular_expression;>regular > expression (also case > insensitive). > > commit > -- > 2.12.1.windows.1 >
Re: [PATCH] compat/regex: fix compilation on Windows
On Sat, May 13, 2017 at 8:30 PM, Johannes Schindelinwrote: > Hi Ævar, > > I originally replied in a very verbose manner, going step by step through > the "one-liner", but decided to rephrase everything. > > So here goes. > > On Sat, 13 May 2017, Ævar Arnfjörð Bjarmason wrote: > >> Let's drop this current gawk import series. > > Well, the reason why you imported the current gawk regex is that we (you?) > decided originally that it'd be easier to go with that one rather than > with the gnulib one (which they friendly-forked). If you switch to gnulib, > I would like to see (in the commit message) the original reason we picked > gawk (which forked gnulib's regex code, after all), and why that reason no > longer applies. It was just easier to extract the code from gawk than gnulib at the time. The thread starting at <20100715220059.GA3312@burratino> has some details. > I also would strongly prefer a *non* one-liner, in a commit message that > adds the relevant source code from gawk or gnulib *verbatim*, followed by > patches that adjust the code to Git's needs. That way, a future update can > repeat the commands outlined in the first commit message and then > cherry-pick the subsequent patches. > > And remember that you do not need to clone the entire repository. You can > 1) fetch into the current one, and 2) use a shallow fetch. Example: > > git fetch --depth 1 http://git.savannah.gnu.org/r/gawk.git \ > gawk-4.1.4 > > The next commands could be something like > > git read-tree --prefix=compat/regex/ FETCH_HEAD: > git clean -dfx -- compat/regex/ Sure, if I submit this again I'll update whatever one liner is in there to import it via git. > Oh, and please do not use `master`. If Git is any indication, a tagged > release will most often be a bit more robust than any in-between commit. The reason to use the master branch of gawk is that since the last release only one trivial change has been made to it, but the files were all renamed as well, so it was easier for that one-liner in the README to use master than the latest release. If we move to gnulib we'll also use the master branch, because that's what you're supposed to do according to their docs, they don't make releases.
Re: [PATCH] fixup! log: add exhaustive tests for pattern style options & config
Hi Jonathan, On Fri, 12 May 2017, Jonathan Nieder wrote: > Johannes Schindelin wrote: > > > On Windows, `(1|2)` is not a valid file name, and therefore the tag > > cannot be created as expected by the new test. > > > > So simply skip this test on Windows. > > > > Signed-off-by: Johannes Schindelin> > --- > > Reviewed-by: Jonathan Nieder > > I wouldn't be surprised if there are filesystems used places other > than MINGW that also can't handle this test. Isn't this what some > tests use a FUNNYNAMES prerequisite for? Some tests do that. And they all implement that prereq themselves, differently. t3600 tries to create a file whose name contains a tab. t4135 tries to create a file whose name contains a tab, but only if the MINGW prereq is set. t9903 tries to create a file whose name contains a newline. It seems, therefore, that the existing FUNNYNAMES are already in disagreement, and as you point out below, the file name in question contains neither a tab nor a newline. So we would introduce *yet another* disagreeing FUNNYNAMES. Besides, in some hopefully not so far future, our default refs backend may not depend on the current platform's file system's idionsyncracies. So maybe we do not even want the MINGW nor the FUNNYNAMES prereq... > In this example, it's the pipe that's not allowed, not the > parenthesis, right? (At least I have some memories of naming files > with some parentheses.) You would be correct in that assumptions. Long names on NTFS can contain parentheses. > Would something like > > test PIPE_IN_FILENAME ' > >"a|b" && > test -f "a|b" > ' > > work? No. Remember: on Windows, there is no Unix shell. (Actually, there is Bash on Ubuntu on Windows, and with the Creators Update, it became truly awesome, I am a big fan of it. But that is besides the point here: Git for Windows needs to support Windows versions older than Windows 10, where Bash on Ubuntu on Windows is simply unavailable.) So what Git for Windows has to do is quite a burden: we ship with MSYS2, kind of a light-weight version of Cygwin, to run our shell scripts. (I would wager a bet that at least a third of the bug reports relate to the POSIX emulation layer we ship to allow us to run Bash and Perl, but even that is only half the truth: tracking down those bugs takes easily the majority of my bug hunting/fixing time, hands down.) The test suite is no exception. I may have ranted about this about six dozen times already, so I'll save me the time. Just pretend that you now read seven long paragraphs describing how much it sucks to run the test suite on Windows. Take home message: Unix shell scripting is good for personal use. Don't use it in applications, not if you want to stay portable. Just don't. Back to the subject: The MSYS2 emulation layer inherits a neat trick from Cygwin, where it *can* create file names containing pipe symbols. They will be quietly mapped into a private UTF-8 page, and when Cygwin or MSYS2 read the file back, the file name maps from this page back to ASCII transparently. That strategy is all good and dandy, as long as you stay within the POSIX emulation layer. Git for Windows avoids the POSIX emulation layer as much as possible, for speed, and also for robustness. Which means that Git does *not* map the file names using said private UTF-8 code page. And therefore, your test would succeed (because the shell script would stay within the POSIX emulation layer, which creates that file using above-mentioned strategy), but Git (being a regular Win32 program) *still* would fail to create said file. Ciao, Dscho
Re: [PATCH] compat/regex: fix compilation on Windows
Hi Ævar, I originally replied in a very verbose manner, going step by step through the "one-liner", but decided to rephrase everything. So here goes. On Sat, 13 May 2017, Ævar Arnfjörð Bjarmason wrote: > Let's drop this current gawk import series. Well, the reason why you imported the current gawk regex is that we (you?) decided originally that it'd be easier to go with that one rather than with the gnulib one (which they friendly-forked). If you switch to gnulib, I would like to see (in the commit message) the original reason we picked gawk (which forked gnulib's regex code, after all), and why that reason no longer applies. I also would strongly prefer a *non* one-liner, in a commit message that adds the relevant source code from gawk or gnulib *verbatim*, followed by patches that adjust the code to Git's needs. That way, a future update can repeat the commands outlined in the first commit message and then cherry-pick the subsequent patches. And remember that you do not need to clone the entire repository. You can 1) fetch into the current one, and 2) use a shallow fetch. Example: git fetch --depth 1 http://git.savannah.gnu.org/r/gawk.git \ gawk-4.1.4 The next commands could be something like git read-tree --prefix=compat/regex/ FETCH_HEAD: git clean -dfx -- compat/regex/ Oh, and please do not use `master`. If Git is any indication, a tagged release will most often be a bit more robust than any in-between commit. Thanks, Dscho
Re: [PATCH 04/29] log: add exhaustive tests for pattern style options & config
On Fri, May 12, 2017 at 6:48 AM, Junio C Hamanowrote: > Ævar Arnfjörð Bjarmason writes: > >> Add exhaustive tests for how the different grep.patternType options & >> the corresponding command-line options affect git-log. >> ... >> The patterns being passed to fixed/basic/extended/PCRE are carefully >> crafted to return the wrong thing if the grep engine were to pick any >> other matching method than the one it's told to use. > > That is good; it may be even more helpful to the readers if they are > given a brief in-code comment. I had to scratch head while reading > them. > >> >> Signed-off-by: Ævar Arnfjörð Bjarmason >> --- >> t/t4202-log.sh | 77 >> +- >> 1 file changed, 76 insertions(+), 1 deletion(-) >> >> diff --git a/t/t4202-log.sh b/t/t4202-log.sh >> index f577990716..6d1411abea 100755 >> --- a/t/t4202-log.sh >> +++ b/t/t4202-log.sh >> @@ -262,7 +262,23 @@ test_expect_success 'log --grep -i' ' >> >> test_expect_success 'log -F -E --grep= uses ere' ' >> echo second >expect && >> - git log -1 --pretty="tformat:%s" -F -E --grep=s.c.nd >actual && > # BRE would need \(s\) to do the same. >> + git log -1 --pretty="tformat:%s" -F -E --grep="(s).c.nd" >actual && >> + test_cmp expect actual >> +' >> + >> +test_expect_success PCRE 'log -F -E --perl-regexp --grep= uses PCRE' ' >> + test_when_finished "rm -rf num_commits" && >> + git init num_commits && >> + ( >> + cd num_commits && >> + test_commit 1d && >> + test_commit 2e >> + ) && >> + echo 2e >expect && > # In PCRE \d in [\d] is like saying "0-9", and match '2' in 2e >> + git -C num_commits log -1 --pretty="tformat:%s" -F -E --perl-regexp >> --grep="[\d]" >actual && >> + test_cmp expect actual && >> + echo 1d >expect && > # In ERE [\d] is bs or letter 'd' and would not match '2' in '2e' > # but does match 'd' in '1d' >> + git -C num_commits log -1 --pretty="tformat:%s" -F -E --grep="[\d]" >> >actual && >> test_cmp expect actual >> ' >> >> @@ -280,6 +296,65 @@ test_expect_success 'log with grep.patternType >> configuration and command line' ' >> test_cmp expect actual >> ' >> >> +test_expect_success 'log with various grep.patternType configurations & >> command-lines' ' >> + git init pattern-type && >> + ( >> + cd pattern-type && >> + test_commit 1 file A && >> + test_commit "(1|2)" file B && >> + >> + echo "(1|2)" >expect.fixed && >> + cp expect.fixed expect.basic && >> + cp expect.fixed expect.extended && >> + cp expect.fixed expect.perl && >> + > # Fixed finds these literally >> + git -c grep.patternType=fixed log --pretty=tformat:%s \ >> + --grep="(1|2)" >actual.fixed && > # BRE matches (, |, and ) literally >> + git -c grep.patternType=basic log --pretty=tformat:%s \ >> + --grep="(.|.)" >actual.basic && > # ERE needs | quoted with bs to match | literally >> + git -c grep.patternType=extended log --pretty=tformat:%s \ >> + --grep="\|2" >actual.extended && > > If we use BRE by mistake, wouldn't this pattern actually find > "(1|2)" anyway, without skipping it and show "1 file A" instead? It'll find (1|2) but also 1, i.e.: $ (echo 1; echo "(1|2)") >/tmp/f; for t in G E; do echo $t: && grep -$t '\|2' /tmp/f | sed 's/^/ /'; done G: 1 (1|2) E: (1|2) So the test will fail under basic. I'll add comments about this & the other things you suggested. >> + if test_have_prereq PCRE >> + then >> + git -c grep.patternType=perl log --pretty=tformat:%s \ >> + --grep="[\d]\|" >actual.perl > # Only PCRE would match [\d]\| with "(1|2)" due to > [\d] >> + fi && >> + test_cmp expect.fixed actual.fixed && >> + test_cmp expect.basic actual.basic && >> + test_cmp expect.extended actual.extended && >> + if test_have_prereq PCRE >> + then >> + test_cmp expect.perl actual.perl >> + fi && > > It could be just a style thing, but I would actually find it easier > to follow if the above two "only with PCRE" tests, i.e. running test > and taking its output in actual.perl and comparing it with > expect.perl, were inside a single "if test_have_prereq PCRE" block. Sure, will fix.
Re: [PATCH v4 03/12] t/test-lib-functions.sh: allow to specify the tag name to test_commit
On Tue, Feb 12, 2013 at 11:17 AM, Brandon Caseywrote: > The part of test_commit() may not be appropriate for a tag name. > So let's allow test_commit to accept a fourth argument to specify the tag > name. [Kind of late to notice, I know] I see nobody spotted in four rounds of reviews that this commit didn't update the corresponding t/README docs for test_commit. > Signed-off-by: Brandon Casey > Reviewed-by: Jonathan Nieder > --- > t/test-lib-functions.sh | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh > index fa62d01..61d0804 100644 > --- a/t/test-lib-functions.sh > +++ b/t/test-lib-functions.sh > @@ -135,12 +135,12 @@ test_pause () { > fi > } > > -# Call test_commit with the arguments " [ []]" > +# Call test_commit with the arguments " [ [ > []]]" > # > # This will commit a file with the given contents and the given commit > -# message. It will also add a tag with as name. > +# message, and tag the resulting commit with the given tag name. > # > -# Both and default to . > +# , , and all default to . > > test_commit () { > notick= && > @@ -168,7 +168,7 @@ test_commit () { > test_tick > fi && > git commit $signoff -m "$1" && > - git tag "$1" > + git tag "${4:-$1}" > } > > # Call test_merge with the arguments " ", where > -- > 1.8.1.3.579.gd9af3b6 > > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/29] grep: amend submodule recursion test for regex engine testing
On Fri, May 12, 2017 at 6:59 AM, Junio C Hamanowrote: > Ævar Arnfjörð Bjarmason writes: > >> Amend the submodule recursion test to prepare it for subsequent tests >> of whether it passes along the grep.patternType to the submodule >> greps. >> >> This is the result of searching & replacing: >> >> foobar -> (1|2)d(3|4) >> foo-> (1|2) >> bar-> (3|4) >> ... >> test_expect_success 'grep and multiple patterns' ' >> cat >expect <<-\EOF && >> - b/b:bar >> + b/b:(3|4) >> EOF >> >> - git grep -e "bar" --and --not -e "foo" --recurse-submodules >actual && >> + git grep -e "(3|4)" --and --not -e "(1|2)d" --recurse-submodules >> >actual && > > > This breaks the promise "foo maps to (1|2)"; I do not think you need > to add 'd' in order to make the test to succeed, so I am not sure > what is going on here. Thanks, fixing it. That was just a stupid mistake on my part, don't know how that snuck in there, must have just fat-fingered (1|2) as (1|2)d during interactive replace.
[PATCH 5/5] p0004: don't error out if test repo is too small
Repositories with less than 4000 entries are always handled using a single thread, causing test-lazy-init-name-hash --multi to error out. Don't abort the whole test script in that case, but simply skip the multi-threaded performance check. We can still use it to compare the single-threaded speed of different versions in that case. Signed-off-by: Rene Scharfe--- t/perf/p0004-lazy-init-name-hash.sh | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/t/perf/p0004-lazy-init-name-hash.sh b/t/perf/p0004-lazy-init-name-hash.sh index 3c2135a185..8de5a98cfc 100755 --- a/t/perf/p0004-lazy-init-name-hash.sh +++ b/t/perf/p0004-lazy-init-name-hash.sh @@ -8,10 +8,13 @@ test_checkout_worktree test_expect_success 'verify both methods build the same hashmaps' ' test-lazy-init-name-hash --dump --single >out.single && - test-lazy-init-name-hash --dump --multi >out.multi && - sort sorted.single && - sort sorted.multi && - test_cmp sorted.single sorted.multi + if test-lazy-init-name-hash --dump --multi >out.multi + then + test_set_prereq REPO_BIG_ENOUGH_FOR_MULTI && + sort sorted.single && + sort sorted.multi && + test_cmp sorted.single sorted.multi + fi ' test_expect_success 'calibrate' ' @@ -46,7 +49,7 @@ test_perf "single-threaded, $desc" " test-lazy-init-name-hash --single --count=$count " -test_perf "multi-threaded, $desc" " +test_perf REPO_BIG_ENOUGH_FOR_MULTI "multi-threaded, $desc" " test-lazy-init-name-hash --multi --count=$count " -- 2.12.2
[PATCH 4/5] p0004: don't abort if multi-threaded is too slow
If the single-threaded variant beats the multi-threaded one then we may have a performance bug, but that doesn't justify aborting the test. Drop that check; we can compare the results for --single and --multi using the actual performance tests. Signed-off-by: Rene Scharfe--- t/perf/p0004-lazy-init-name-hash.sh | 4 1 file changed, 4 deletions(-) diff --git a/t/perf/p0004-lazy-init-name-hash.sh b/t/perf/p0004-lazy-init-name-hash.sh index d30c32f97b..3c2135a185 100755 --- a/t/perf/p0004-lazy-init-name-hash.sh +++ b/t/perf/p0004-lazy-init-name-hash.sh @@ -14,10 +14,6 @@ test_expect_success 'verify both methods build the same hashmaps' ' test_cmp sorted.single sorted.multi ' -test_expect_success 'multithreaded should be faster' ' - test-lazy-init-name-hash --perf >out.perf -' - test_expect_success 'calibrate' ' entries=$(wc -l
[PATCH 3/5] p0004: use test_perf
The perf test suite (more specifically: t/perf/aggregate.perl) requires each test script to write test results into a file, otherwise it aborts when aggregating. Add actual performance tests with test_perf to allow p0004 to be run together with other perf scripts. Calibrate the value for the parameter --count based on the size of the test repository, in order to get meaningful results with smaller repos yet still be able to finish the script against huge ones without having to wait for hours. Signed-off-by: Rene Scharfe--- The numbers are just guesses; I didn't actually test all ranges. t/perf/p0004-lazy-init-name-hash.sh | 36 1 file changed, 36 insertions(+) diff --git a/t/perf/p0004-lazy-init-name-hash.sh b/t/perf/p0004-lazy-init-name-hash.sh index 576bdc3c4e..d30c32f97b 100755 --- a/t/perf/p0004-lazy-init-name-hash.sh +++ b/t/perf/p0004-lazy-init-name-hash.sh @@ -18,4 +18,40 @@ test_expect_success 'multithreaded should be faster' ' test-lazy-init-name-hash --perf >out.perf ' +test_expect_success 'calibrate' ' + entries=$(wc -l
[PATCH 2/5] p0004: avoid using pipes
The return code of commands on the producing end of a pipe is ignored. Evaluate the outcome of test-lazy-init-name-hash by calling sort separately. Signed-off-by: Rene Scharfe--- t/perf/p0004-lazy-init-name-hash.sh | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/t/perf/p0004-lazy-init-name-hash.sh b/t/perf/p0004-lazy-init-name-hash.sh index 716c951553..576bdc3c4e 100755 --- a/t/perf/p0004-lazy-init-name-hash.sh +++ b/t/perf/p0004-lazy-init-name-hash.sh @@ -7,9 +7,11 @@ test_perf_large_repo test_checkout_worktree test_expect_success 'verify both methods build the same hashmaps' ' - test-lazy-init-name-hash --dump --single | sort >out.single && - test-lazy-init-name-hash --dump --multi | sort >out.multi && - test_cmp out.single out.multi + test-lazy-init-name-hash --dump --single >out.single && + test-lazy-init-name-hash --dump --multi >out.multi && + sort sorted.single && + sort sorted.multi && + test_cmp sorted.single sorted.multi ' test_expect_success 'multithreaded should be faster' ' -- 2.12.2
[PATCH 1/5] p0004: simplify calls of test-lazy-init-name-hash
The test library puts helpers into $PATH, so we can simply call them without specifying their location. The suffix $X is also not necessary because .exe files on Windows can be started without specifying their extension, and on other platforms it's empty anyway. Signed-off-by: Rene Scharfe--- t/perf/p0004-lazy-init-name-hash.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/t/perf/p0004-lazy-init-name-hash.sh b/t/perf/p0004-lazy-init-name-hash.sh index 5afa8c8df3..716c951553 100755 --- a/t/perf/p0004-lazy-init-name-hash.sh +++ b/t/perf/p0004-lazy-init-name-hash.sh @@ -7,13 +7,13 @@ test_perf_large_repo test_checkout_worktree test_expect_success 'verify both methods build the same hashmaps' ' - $GIT_BUILD_DIR/t/helper/test-lazy-init-name-hash$X --dump --single | sort >out.single && - $GIT_BUILD_DIR/t/helper/test-lazy-init-name-hash$X --dump --multi | sort >out.multi && + test-lazy-init-name-hash --dump --single | sort >out.single && + test-lazy-init-name-hash --dump --multi | sort >out.multi && test_cmp out.single out.multi ' test_expect_success 'multithreaded should be faster' ' - $GIT_BUILD_DIR/t/helper/test-lazy-init-name-hash$X --perf >out.perf + test-lazy-init-name-hash --perf >out.perf ' test_done -- 2.12.2
[PATCH 0/5] p0004: support being called by t/perf/run
p0004-lazy-init-name-hash.sh errors out if the test repo is too small, and doesn't generate any perf test results even if it finishes successfully. That prevents t/perf/run from running the whole test suite. This series tries to address these issues. p0004: simplify calls of test-lazy-init-name-hash p0004: avoid using pipes p0004: use test_perf p0004: don't abort if multi-threaded is too slow p0004: don't error out if test repo is too small t/perf/p0004-lazy-init-name-hash.sh | 47 + 1 file changed, 42 insertions(+), 5 deletions(-) -- 2.12.2