date:20170513

Re: [PATCH 18/19] diff: buffer all output if asked to

2017-05-13 Thread Stefan Beller

On Sat, May 13, 2017 at 9:06 PM, Jeff King  wrote:
> On Sat, May 13, 2017 at 09:01:16PM -0700, Stefan Beller wrote:
>
>> + for (i = 0; i < o->line_buffer_nr; i++);
>> + free((void*)o->line_buffer[i].line);
>
> I haven't looked at the patches yet, but this ";" on the for line is
> almost certainly a typo (gcc catches it due to the misleading
> indentation of the next line).

Grr  :/

I have spent hours trying to figure out why this does not work,
questioning the design, my mental model of how pointers work
and programming in general.

/me should get gcc 6 and set  -Wmisleading-indentation

Thanks,
Stefan

>
> -Peff

Re: [PATCH 18/19] diff: buffer all output if asked to

2017-05-13 Thread Jeff King

On Sat, May 13, 2017 at 09:01:16PM -0700, Stefan Beller wrote:

> + for (i = 0; i < o->line_buffer_nr; i++);
> + free((void*)o->line_buffer[i].line);

I haven't looked at the patches yet, but this ";" on the for line is
almost certainly a typo (gcc catches it due to the misleading
indentation of the next line).

-Peff

[PATCH 15/19] diff.c: convert diff_flush to use emit_line_*

2017-05-13 Thread Stefan Beller

In a later patch, I want to propose an option to detect
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers diff_flush.

Signed-off-by: Stefan Beller 
---
 diff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 07041a35ad..386b28cf47 100644
--- a/diff.c
+++ b/diff.c
@@ -4873,7 +4873,9 @@ void diff_flush(struct diff_options *options)
  term, 1);
if (options->stat_sep) {
/* attach patch instead of inline */
-   fputs(options->stat_sep, options->file);
+   emit_line(options, NULL, NULL,
+ options->stat_sep,
+ strlen(options->stat_sep));
}
}
 
-- 
2.13.0.18.g183880de0a

[PATCH 05/19] diff.c: emit_line_0 can handle no color setting

2017-05-13 Thread Stefan Beller

In a later patch, I want to propose an option to detect
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

In later patches we may pass lines that are not colored to
the central function emit_line_0, so we
need to emit the color only when it is non-NULL.

Signed-off-by: Stefan Beller 
---
 diff.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 381b572d76..48f0fb98dc 100644
--- a/diff.c
+++ b/diff.c
@@ -532,11 +532,13 @@ static void emit_line_0(struct diff_options *o, const 
char *set, const char *res
len--;
 
if (len || sign) {
-   fputs(set, file);
+   if (set)
+   fputs(set, file);
if (sign)
fputc(sign, file);
fwrite(line, len, 1, file);
-   fputs(reset, file);
+   if (reset)
+   fputs(reset, file);
}
if (has_trailing_carriage_return)
fputc('\r', file);
-- 
2.13.0.18.g183880de0a

[PATCH 04/19] diff.c: factor out diff_flush_patch_all_file_pairs

2017-05-13 Thread Stefan Beller

In a later patch we want to do more things before and after all filepairs
are flushed. So factor flushing out all file pairs into its own function
that the new code can be plugged in easily.

Signed-off-by: Stefan Beller 
---
 diff.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 4269b8dccf..381b572d76 100644
--- a/diff.c
+++ b/diff.c
@@ -4728,6 +4728,17 @@ void diff_warn_rename_limit(const char *varname, int 
needed, int degraded_cc)
warning(_(rename_limit_advice), varname, needed);
 }
 
+static void diff_flush_patch_all_file_pairs(struct diff_options *o)
+{
+   int i;
+   struct diff_queue_struct *q = _queued_diff;
+   for (i = 0; i < q->nr; i++) {
+   struct diff_filepair *p = q->queue[i];
+   if (check_pair_status(p))
+   diff_flush_patch(p, o);
+   }
+}
+
 void diff_flush(struct diff_options *options)
 {
struct diff_queue_struct *q = _queued_diff;
@@ -4825,11 +4836,7 @@ void diff_flush(struct diff_options *options)
}
}
 
-   for (i = 0; i < q->nr; i++) {
-   struct diff_filepair *p = q->queue[i];
-   if (check_pair_status(p))
-   diff_flush_patch(p, options);
-   }
+   diff_flush_patch_all_file_pairs(options);
}
 
if (output_format & DIFF_FORMAT_CALLBACK)
-- 
2.13.0.18.g183880de0a

[PATCH 13/19] diff.c: convert show_stats to use emit_line_*

2017-05-13 Thread Stefan Beller

In a later patch, I want to propose an option to detect
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

We call print_stat_summary from builtin/apply, so we still
need the version with a file pointer, so introduce
print_stat_summary_0 that uses emit_line_* machinery and
keep print_stat_summary with the same arguments around.

Signed-off-by: Stefan Beller 
---
 diff.c | 89 ++
 diff.h |  4 +--
 2 files changed, 53 insertions(+), 40 deletions(-)

diff --git a/diff.c b/diff.c
index ffdb728810..91dc045a45 100644
--- a/diff.c
+++ b/diff.c
@@ -1529,20 +1529,19 @@ static int scale_linear(int it, int width, int 
max_change)
return 1 + (it * (width - 1) / max_change);
 }
 
-static void show_name(FILE *file,
+static void show_name(struct strbuf *out,
  const char *prefix, const char *name, int len)
 {
-   fprintf(file, " %s%-*s |", prefix, len, name);
+   strbuf_addf(out, " %s%-*s |", prefix, len, name);
 }
 
-static void show_graph(FILE *file, char ch, int cnt, const char *set, const 
char *reset)
+static void show_graph(struct strbuf *out, char ch, int cnt, const char *set, 
const char *reset)
 {
if (cnt <= 0)
return;
-   fprintf(file, "%s", set);
-   while (cnt--)
-   putc(ch, file);
-   fprintf(file, "%s", reset);
+   strbuf_addstr(out, set);
+   strbuf_addchars(out, ch, cnt);
+   strbuf_addstr(out, reset);
 }
 
 static void fill_print_name(struct diffstat_file *file)
@@ -1566,14 +1565,16 @@ static void fill_print_name(struct diffstat_file *file)
file->print_name = pname;
 }
 
-int print_stat_summary(FILE *fp, int files, int insertions, int deletions)
+void print_stat_summary_0(struct diff_options *options, int files,
+ int insertions, int deletions)
 {
struct strbuf sb = STRBUF_INIT;
-   int ret;
 
if (!files) {
assert(insertions == 0 && deletions == 0);
-   return fprintf(fp, "%s\n", " 0 files changed");
+   strbuf_addstr(, " 0 files changed");
+   emit_line(options, NULL, NULL, sb.buf, sb.len);
+   return;
}
 
strbuf_addf(,
@@ -1600,9 +1601,17 @@ int print_stat_summary(FILE *fp, int files, int 
insertions, int deletions)
deletions);
}
strbuf_addch(, '\n');
-   ret = fputs(sb.buf, fp);
+   emit_line(options, NULL, NULL, sb.buf, sb.len);
strbuf_release();
-   return ret;
+}
+
+void print_stat_summary(FILE *fp, int files,
+   int insertions, int deletions)
+{
+   struct diff_options o;
+   memset(, 0, sizeof(o));
+   o.file = fp;
+   print_stat_summary_0(, files, insertions, deletions);
 }
 
 static void show_stats(struct diffstat_t *data, struct diff_options *options)
@@ -1612,13 +1621,13 @@ static void show_stats(struct diffstat_t *data, struct 
diff_options *options)
int total_files = data->nr, count;
int width, name_width, graph_width, number_width = 0, bin_width = 0;
const char *reset, *add_c, *del_c;
-   const char *line_prefix = "";
int extra_shown = 0;
+   const char *line_prefix = diff_line_prefix(options);
+   struct strbuf out = STRBUF_INIT;
 
if (data->nr == 0)
return;
 
-   line_prefix = diff_line_prefix(options);
count = options->stat_count ? options->stat_count : data->nr;
 
reset = diff_get_color_opt(options, DIFF_RESET);
@@ -1772,26 +1781,29 @@ static void show_stats(struct diffstat_t *data, struct 
diff_options *options)
}
 
if (file->is_binary) {
-   fprintf(options->file, "%s", line_prefix);
-   show_name(options->file, prefix, name, len);
-   fprintf(options->file, " %*s", number_width, "Bin");
+   show_name(, prefix, name, len);
+   strbuf_addf(, " %*s", number_width, "Bin");
if (!added && !deleted) {
-   putc('\n', options->file);
+   strbuf_addch(, '\n');
+   emit_line(options, NULL, NULL, out.buf, 
out.len);
+   strbuf_reset();
continue;
}
-   fprintf(options->file, " %s%"PRIuMAX"%s",
+   strbuf_addf(, " %s%"PRIuMAX"%s",
del_c, deleted, reset);
-

[PATCH 09/19] diff.c: convert emit_rewrite_diff to use emit_line_*

2017-05-13 Thread Stefan Beller

In a later patch, I want to propose an option to detect
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_rewrite_diff.

Signed-off-by: Stefan Beller 
---
 diff.c | 33 ++---
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/diff.c b/diff.c
index 8e00010bf4..e4b46fee4f 100644
--- a/diff.c
+++ b/diff.c
@@ -708,17 +708,17 @@ static void remove_tempfile(void)
}
 }
 
-static void print_line_count(FILE *file, int count)
+static void add_line_count(struct strbuf *out, int count)
 {
switch (count) {
case 0:
-   fprintf(file, "0,0");
+   strbuf_addstr(out, "0,0");
break;
case 1:
-   fprintf(file, "1");
+   strbuf_addstr(out, "1");
break;
default:
-   fprintf(file, "1,%d", count);
+   strbuf_addf(out, "1,%d", count);
break;
}
 }
@@ -772,7 +772,7 @@ static void emit_rewrite_diff(const char *name_a,
char *data_one, *data_two;
size_t size_one, size_two;
struct emit_callback ecbdata;
-   const char *line_prefix = diff_line_prefix(o);
+   struct strbuf out = STRBUF_INIT;
 
if (diff_mnemonic_prefix && DIFF_OPT_TST(o, REVERSE_DIFF)) {
a_prefix = o->b_prefix;
@@ -810,20 +810,23 @@ static void emit_rewrite_diff(const char *name_a,
ecbdata.lno_in_preimage = 1;
ecbdata.lno_in_postimage = 1;
 
+   emit_line_fmt(o, metainfo, reset, "--- %s%s\n", a_name.buf, name_a_tab);
+   emit_line_fmt(o, metainfo, reset, "+++ %s%s\n", b_name.buf, name_b_tab);
+
lc_a = count_lines(data_one, size_one);
lc_b = count_lines(data_two, size_two);
-   fprintf(o->file,
-   "%s%s--- %s%s%s\n%s%s+++ %s%s%s\n%s%s@@ -",
-   line_prefix, metainfo, a_name.buf, name_a_tab, reset,
-   line_prefix, metainfo, b_name.buf, name_b_tab, reset,
-   line_prefix, fraginfo);
+
+   strbuf_addstr(, "@@ -");
if (!o->irreversible_delete)
-   print_line_count(o->file, lc_a);
+   add_line_count(, lc_a);
else
-   fprintf(o->file, "?,?");
-   fprintf(o->file, " +");
-   print_line_count(o->file, lc_b);
-   fprintf(o->file, " @@%s\n", reset);
+   strbuf_addstr(, "?,?");
+   strbuf_addstr(, " +");
+   add_line_count(, lc_b);
+   strbuf_addstr(, " @@\n");
+   emit_line(o, fraginfo, reset, out.buf, out.len);
+   strbuf_release();
+
if (lc_a && !o->irreversible_delete)
emit_rewrite_lines(, '-', data_one, size_one);
if (lc_b)
-- 
2.13.0.18.g183880de0a

[PATCH 18/19] diff: buffer all output if asked to

2017-05-13 Thread Stefan Beller

Introduce a new option 'use_buffer' in the struct diff_options which
controls whether all output is buffered up until all output is available.

We'll have two new structs in diff.h, one of them 'buffered_patch_line'
will be used to buffer each line, and the other 'buffered_patch_file_pair'
will store information about information relevant on a per file basis.
The buffered_patch_line will duplicate the memory of the line to buffer
as that is easiest to reason about for now.  In a future patch we may want
to decrease the memory usage by not duplicating all output for buffering
but rather we may want to store offsets into the file or in case of hunk
descriptions such as the similarity score, we could just store the relevant
number and reproduce the text later on.

This approach was chosen as a first step because it is quite simple
compared to the alternative with less memory footprint.

emit_line_0 factors out the emission part into emit_line_emission,
and depending on the diff_options->use_buffer the emission
will be performed directly when calling emit_line_0 or after the
whole process is done, i.e. by buffering we have add the possibility
for a second pass over the whole output before doing the actual
output.

In 6440d34 (2012-03-14, diff: tweak a _copy_ of diff_options with
word-diff) we introduced a duplicate diff options struct for word
emissions as we may have different regex settings in there.

When buffering the output, we need to operate on just one buffer,
so we have to copy back the emissions of the word buffer into the main
buffer.

Unconditionally enable output via buffer in this patch as it yields
a great opportunity for testing, i.e. all the diff tests from the
test suite pass without having reordering issues (i.e. only parts
of the output got buffered, and we forgot to buffer other parts).
The test suite passes, which gives confidence that we converted all
functions to use emit_line_* for output.

Signed-off-by: Stefan Beller 
---
 diff.c | 158 ++---
 diff.h |  39 
 2 files changed, 181 insertions(+), 16 deletions(-)

diff --git a/diff.c b/diff.c
index 08dcc56bb9..dbab7fb44e 100644
--- a/diff.c
+++ b/diff.c
@@ -516,29 +516,29 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t 
*mf2,
ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void emit_line_0(struct diff_options *o, const char *set, const char 
*reset,
-   int sign, const char *line, int len)
+static void emit_buffered_patch_line(struct diff_options *o,
+struct buffered_patch_line *e)
 {
-   int has_trailing_newline, has_trailing_carriage_return;
+   int has_trailing_newline, has_trailing_carriage_return, len = e->len;
FILE *file = o->file;
 
fputs(diff_line_prefix(o), file);
 
-   has_trailing_newline = (len > 0 && line[len-1] == '\n');
+   has_trailing_newline = (len > 0 && e->line[len-1] == '\n');
if (has_trailing_newline)
len--;
-   has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
+   has_trailing_carriage_return = (len > 0 && e->line[len-1] == '\r');
if (has_trailing_carriage_return)
len--;
 
-   if (len || sign) {
-   if (set)
-   fputs(set, file);
-   if (sign)
-   fputc(sign, file);
-   fwrite(line, len, 1, file);
-   if (reset)
-   fputs(reset, file);
+   if (len || e->sign) {
+   if (e->set)
+   fputs(e->set, file);
+   if (e->sign)
+   fputc(e->sign, file);
+   fwrite(e->line, len, 1, file);
+   if (e->reset)
+   fputs(e->reset, file);
}
if (has_trailing_carriage_return)
fputc('\r', file);
@@ -546,6 +546,65 @@ static void emit_line_0(struct diff_options *o, const char 
*set, const char *res
fputc('\n', file);
 }
 
+static void emit_buffered_patch_line_ws(struct diff_options *o,
+   struct buffered_patch_line *e,
+   const char *ws, unsigned ws_rule)
+{
+   struct buffered_patch_line s = {e->set, e->reset, "", 0, e->sign};
+
+   emit_buffered_patch_line(o, );
+   ws_check_emit(e->line, e->len, ws_rule,
+ o->file, e->set, e->reset, ws);
+}
+
+static void process_next_buffered_patch_line(struct diff_options *o, int 
line_no)
+{
+   struct buffered_patch_line *e = >line_buffer[line_no];
+
+   const char *ws = o->current_filepair->ws;
+   unsigned ws_rule = o->current_filepair->ws_rule;
+
+   switch (e->state) {
+   case BPL_EMIT_LINE_ASIS:
+   emit_buffered_patch_line(o, e);
+   break;
+   case

[RFC PATCH 00/19] Diff machine: highlight moved lines.

2017-05-13 Thread Stefan Beller

For details on *why* see the commit message of the last commit.

The first five patches are slight refactorings to get into good
shape, the next patches are funneling all output through emit_line_*.

The second last patch introduces an option to buffer up all output
before printing, and then the last patch can color up moved lines
of code.

Any feedback welcome.

Thanks,
Stefan 


Stefan Beller (19):
  diff: readability fix
  diff: move line ending check into emit_hunk_header
  diff.c: drop 'nofirst' from emit_line_0
  diff.c: factor out diff_flush_patch_all_file_pairs
  diff.c: emit_line_0 can handle no color setting
  diff: add emit_line_fmt
  diff.c: convert fn_out_consume to use emit_line_*
  diff.c: convert builtin_diff to use emit_line_*
  diff.c: convert emit_rewrite_diff to use emit_line_*
  diff.c: convert emit_rewrite_lines to use emit_line_*
  submodule.c: convert show_submodule_summary to use emit_line_fmt
  diff.c: convert emit_binary_diff_body to use emit_line_*
  diff.c: convert show_stats to use emit_line_*
  diff.c: convert word diffing to use emit_line_*
  diff.c: convert diff_flush to use emit_line_*
  diff.c: convert diff_summary to use emit_line_*
  diff.c: factor out emit_line_ws for coloring whitespaces
  diff: buffer all output if asked to
  diff.c: color moved lines differently

 Documentation/config.txt   |  12 +-
 diff.c | 815 ++---
 diff.h |  69 +++-
 submodule.c|  78 ++---
 submodule.h|   9 +-
 t/t4015-diff-whitespace.sh | 229 +
 6 files changed, 960 insertions(+), 252 deletions(-)

-- 
2.13.0.18.g183880de0a

[PATCH 01/19] diff: readability fix

2017-05-13 Thread Stefan Beller

We already have dereferenced 'p->two' into a local variable 'two'. Use
that.

Signed-off-by: Stefan Beller 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 74283d9001..3f5bf8b5a4 100644
--- a/diff.c
+++ b/diff.c
@@ -3283,8 +3283,8 @@ static void run_diff(struct diff_filepair *p, struct 
diff_options *o)
const char *other;
const char *attr_path;
 
-   name  = p->one->path;
-   other = (strcmp(name, p->two->path) ? p->two->path : NULL);
+   name  = one->path;
+   other = (strcmp(name, two->path) ? two->path : NULL);
attr_path = name;
if (o->prefix_length)
strip_prefix(o->prefix_length, , );
-- 
2.13.0.18.g183880de0a

[PATCH 12/19] diff.c: convert emit_binary_diff_body to use emit_line_*

2017-05-13 Thread Stefan Beller

In a later patch, I want to propose an option to detect
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_binary_diff_body.

Signed-off-by: Stefan Beller 
---
 diff.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index 45ec311828..ffdb728810 100644
--- a/diff.c
+++ b/diff.c
@@ -2233,8 +2233,8 @@ static unsigned char *deflate_it(char *data,
return deflated;
 }
 
-static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
- const char *prefix)
+static void emit_binary_diff_body(struct diff_options *o,
+ mmfile_t *one, mmfile_t *two)
 {
void *cp;
void *delta;
@@ -2263,13 +2263,12 @@ static void emit_binary_diff_body(FILE *file, mmfile_t 
*one, mmfile_t *two,
}
 
if (delta && delta_size < deflate_size) {
-   fprintf(file, "%sdelta %lu\n", prefix, orig_size);
+   emit_line_fmt(o, NULL, NULL, "delta %lu\n", orig_size);
free(deflated);
data = delta;
data_size = delta_size;
-   }
-   else {
-   fprintf(file, "%sliteral %lu\n", prefix, two->size);
+   } else {
+   emit_line_fmt(o, NULL, NULL, "literal %lu\n", two->size);
free(delta);
data = deflated;
data_size = deflate_size;
@@ -2278,8 +2277,9 @@ static void emit_binary_diff_body(FILE *file, mmfile_t 
*one, mmfile_t *two,
/* emit data encoded in base85 */
cp = data;
while (data_size) {
+   int len;
int bytes = (52 < data_size) ? 52 : data_size;
-   char line[70];
+   char line[71];
data_size -= bytes;
if (bytes <= 26)
line[0] = bytes + 'A' - 1;
@@ -2287,20 +2287,25 @@ static void emit_binary_diff_body(FILE *file, mmfile_t 
*one, mmfile_t *two,
line[0] = bytes - 26 + 'a' - 1;
encode_85(line + 1, cp, bytes);
cp = (char *) cp + bytes;
-   fprintf(file, "%s", prefix);
-   fputs(line, file);
-   fputc('\n', file);
+
+   len = strlen(line);
+   line[len++] = '\n';
+   line[len] = '\0';
+
+   emit_line(o, NULL, NULL, line, len);
}
-   fprintf(file, "%s\n", prefix);
+   emit_line(o, NULL, NULL, "\n", 1);
free(data);
 }
 
-static void emit_binary_diff(FILE *file, mmfile_t *one, mmfile_t *two,
-const char *prefix)
+static void emit_binary_diff(struct diff_options *o,
+mmfile_t *one, mmfile_t *two)
 {
-   fprintf(file, "%sGIT binary patch\n", prefix);
-   emit_binary_diff_body(file, one, two, prefix);
-   emit_binary_diff_body(file, two, one, prefix);
+   const char *s = "GIT binary patch\n";
+   const int len = strlen(s);
+   emit_line(o, NULL, NULL, s, len);
+   emit_binary_diff_body(o, one, two);
+   emit_binary_diff_body(o, two, one);
 }
 
 int diff_filespec_is_binary(struct diff_filespec *one)
@@ -2489,7 +2494,7 @@ static void builtin_diff(const char *name_a,
  header.buf, header.len);
strbuf_reset();
if (DIFF_OPT_TST(o, BINARY))
-   emit_binary_diff(o->file, , , line_prefix);
+   emit_binary_diff(o, , );
else
emit_line_fmt(o, NULL, NULL,
  "Binary files %s and %s differ\n",
-- 
2.13.0.18.g183880de0a

[PATCH 06/19] diff: add emit_line_fmt

2017-05-13 Thread Stefan Beller

In the following patches we'll convert all printing functions to use
the emit_line_* family of functions.

Many of the printing functions to be converted are formatted. So offer
a formatted function in the emit_line function family as well.

Signed-off-by: Stefan Beller 
---
 diff.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/diff.c b/diff.c
index 48f0fb98dc..aef159a919 100644
--- a/diff.c
+++ b/diff.c
@@ -552,6 +552,20 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
emit_line_0(o, set, reset, 0, line, len);
 }
 
+static void emit_line_fmt(struct diff_options *o,
+ const char *set, const char *reset,
+ const char *fmt, ...)
+{
+   struct strbuf sb = STRBUF_INIT;
+   va_list ap;
+   va_start(ap, fmt);
+   strbuf_vaddf(, fmt, ap);
+   va_end(ap);
+
+   emit_line(o, set, reset, sb.buf, sb.len);
+   strbuf_release();
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char 
*line, int len)
 {
if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
-- 
2.13.0.18.g183880de0a

[PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0

2017-05-13 Thread Stefan Beller

In 250f79930d (diff.c: split emit_line() from the first char and the rest
of the line, 2009-09-14) we introduced the local variable 'nofirst' that
indicates if we have no first sign character. With the given implementation
we had to use an extra variable unlike reusing 'first' because the lines
first character could be '\0'.

Change the meaning of the 'first' argument to not mean the first character
of the line, but rather just containing the sign that is prepended to the
line. Refactor emit_line to not include the lines first character, but pass
the complete line as well as a '\0' sign, which now serves as an indication
not to print a sign.

With this patch other callers hard code the sign (which are '+', '-',
' ' and '\\') such that we do not run into unexpectedly emitting an
error-nous '\0'.

The audit of the caller revealed that the sign cannot be '\n' or '\r',
so remove that condition for trailing newline or carriage return in the
sign; the else part of the condition handles the len==0 perfectly,
so we can drop the if/else construct.

Signed-off-by: Stefan Beller 
---
 diff.c | 40 +---
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/diff.c b/diff.c
index c2ed605cd0..4269b8dccf 100644
--- a/diff.c
+++ b/diff.c
@@ -517,33 +517,24 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t 
*mf2,
 }
 
 static void emit_line_0(struct diff_options *o, const char *set, const char 
*reset,
-   int first, const char *line, int len)
+   int sign, const char *line, int len)
 {
int has_trailing_newline, has_trailing_carriage_return;
-   int nofirst;
FILE *file = o->file;
 
fputs(diff_line_prefix(o), file);
 
-   if (len == 0) {
-   has_trailing_newline = (first == '\n');
-   has_trailing_carriage_return = (!has_trailing_newline &&
-   (first == '\r'));
-   nofirst = has_trailing_newline || has_trailing_carriage_return;
-   } else {
-   has_trailing_newline = (len > 0 && line[len-1] == '\n');
-   if (has_trailing_newline)
-   len--;
-   has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
-   if (has_trailing_carriage_return)
-   len--;
-   nofirst = 0;
-   }
+   has_trailing_newline = (len > 0 && line[len-1] == '\n');
+   if (has_trailing_newline)
+   len--;
+   has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
+   if (has_trailing_carriage_return)
+   len--;
 
-   if (len || !nofirst) {
+   if (len || sign) {
fputs(set, file);
-   if (!nofirst)
-   fputc(first, file);
+   if (sign)
+   fputc(sign, file);
fwrite(line, len, 1, file);
fputs(reset, file);
}
@@ -556,7 +547,7 @@ static void emit_line_0(struct diff_options *o, const char 
*set, const char *res
 static void emit_line(struct diff_options *o, const char *set, const char 
*reset,
  const char *line, int len)
 {
-   emit_line_0(o, set, reset, line[0], line+1, len-1);
+   emit_line_0(o, set, reset, 0, line, len);
 }
 
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char 
*line, int len)
@@ -4822,9 +4813,12 @@ void diff_flush(struct diff_options *options)
 
if (output_format & DIFF_FORMAT_PATCH) {
if (separator) {
-   fprintf(options->file, "%s%c",
-   diff_line_prefix(options),
-   options->line_termination);
+   char term[2];
+   term[0] = options->line_termination;
+   term[1] = '\0';
+
+   emit_line(options, NULL, NULL,
+ term, 1);
if (options->stat_sep) {
/* attach patch instead of inline */
fputs(options->stat_sep, options->file);
-- 
2.13.0.18.g183880de0a

[PATCH 08/19] diff.c: convert builtin_diff to use emit_line_*

2017-05-13 Thread Stefan Beller

In a later patch, I want to propose an option to detect
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers builtin_diff.

Signed-off-by: Stefan Beller 
---
 diff.c | 43 ++-
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index 93343a9ccc..8e00010bf4 100644
--- a/diff.c
+++ b/diff.c
@@ -1293,8 +1293,9 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
o->found_changes = 1;
 
if (ecbdata->header) {
-   fprintf(o->file, "%s", ecbdata->header->buf);
-   strbuf_reset(ecbdata->header);
+   emit_line(o, NULL, NULL,
+ ecbdata->header->buf, ecbdata->header->len);
+   strbuf_release(ecbdata->header);
ecbdata->header = NULL;
}
 
@@ -2407,7 +2408,7 @@ static void builtin_diff(const char *name_a,
b_two = quote_two(b_prefix, name_b + (*name_b == '/'));
lbl[0] = DIFF_FILE_VALID(one) ? a_one : "/dev/null";
lbl[1] = DIFF_FILE_VALID(two) ? b_two : "/dev/null";
-   strbuf_addf(, "%s%sdiff --git %s %s%s\n", line_prefix, meta, 
a_one, b_two, reset);
+   strbuf_addf(, "%sdiff --git %s %s%s\n", meta, a_one, b_two, 
reset);
if (lbl[0][0] == '/') {
/* /dev/null */
strbuf_addf(, "%s%snew file mode %06o%s\n", line_prefix, 
meta, two->mode, reset);
@@ -2439,7 +2440,7 @@ static void builtin_diff(const char *name_a,
if (complete_rewrite &&
(textconv_one || !diff_filespec_is_binary(one)) &&
(textconv_two || !diff_filespec_is_binary(two))) {
-   fprintf(o->file, "%s", header.buf);
+   emit_line(o, NULL, NULL, header.buf, header.len);
strbuf_reset();
emit_rewrite_diff(name_a, name_b, one, two,
textconv_one, textconv_two, o);
@@ -2449,7 +2450,8 @@ static void builtin_diff(const char *name_a,
}
 
if (o->irreversible_delete && lbl[1][0] == '/') {
-   fprintf(o->file, "%s", header.buf);
+   if (header.len)
+   emit_line(o, NULL, NULL, header.buf, header.len);
strbuf_reset();
goto free_ab_and_return;
} else if (!DIFF_OPT_TST(o, TEXT) &&
@@ -2459,13 +2461,16 @@ static void builtin_diff(const char *name_a,
S_ISREG(one->mode) && S_ISREG(two->mode) &&
!DIFF_OPT_TST(o, BINARY)) {
if (!oidcmp(>oid, >oid)) {
-   if (must_show_header)
-   fprintf(o->file, "%s", header.buf);
+   if (must_show_header && header.len)
+   emit_line(o, NULL, NULL,
+ header.buf, header.len);
goto free_ab_and_return;
}
-   fprintf(o->file, "%s", header.buf);
-   fprintf(o->file, "%sBinary files %s and %s differ\n",
-   line_prefix, lbl[0], lbl[1]);
+   if (header.len)
+   emit_line(o, NULL, NULL,
+ header.buf, header.len);
+   emit_line_fmt(o, 0, 0, "Binary files %s and %s 
differ\n",
+ lbl[0], lbl[1]);
goto free_ab_and_return;
}
if (fill_mmfile(, one) < 0 || fill_mmfile(, two) < 0)
@@ -2473,17 +2478,21 @@ static void builtin_diff(const char *name_a,
/* Quite common confusing case */
if (mf1.size == mf2.size &&
!memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
-   if (must_show_header)
-   fprintf(o->file, "%s", header.buf);
+   if (must_show_header && header.len)
+   emit_line(o, NULL, NULL,
+ header.buf, header.len);
goto free_ab_and_return;
}
-   fprintf(o->file, "%s", header.buf);
+   if (header.len)
+   emit_line(o, NULL, NULL,
+ header.buf, header.len);
strbuf_reset();
if (DIFF_OPT_TST(o, BINARY))
emit_binary_diff(o->file,

[PATCH 19/19] diff.c: color moved lines differently

2017-05-13 Thread Stefan Beller

When there is a lot of code moved around such as in 11979b9 (2005-11-18,
"http.c: reorder to avoid compilation failure.") for example, the review
process is quite hard, as it is not mentally challenging.  It is a rather
tedious process, that gets boring quickly. However you still need to read
through all of the code to make sure the moved lines are there as supposed.

While it is trivial to color up a patch like the following

$ git diff
diff --git a/file2.c b/file2.c
index 9163a0f..8e66dc0 100644
--- a/file2.c
+++ b/file2.c
@@ -3,13 +3,6 @@ void *xmemdupz(const void *data, size_t len)
return memcpy(xmallocz(len), data, len);
 }

-int secure_foo(struct user *u)
-{
-   if (!u->is_allowed_foo)
-   return;
-   foo(u);
-}
-
 char *xstrndup(const char *str, size_t len)
 {
char *p = memchr(str, '\0', len);
diff --git a/test.c b/test.c
index a95e6fe..81eb0eb 100644
--- a/test.c
+++ b/test.c
@@ -18,6 +18,13 @@ ssize_t pread_in_full(int fd, void *buf, size_t count, 
off_t offset)
return total;
 }

+int secure_foo(struct user *u)
+{
+   if (!u->is_allowed_foo)
+   return;
+   foo(u);
+}
+
 int xdup(int fd)
 {
int ret = dup(fd);

as in this patch all lines that add or remove lines
should be colored in the new color that indicates moved
lines.

However the intention of this patch is to aid reviewers
to spotting permutations in the moved code. So consider the
following malicious move:

diff --git a/file2.c b/file2.c
index 9163a0f..8e66dc0 100644
--- a/file2.c
+++ b/file2.c
@@ -3,13 +3,6 @@ void *xmemdupz(const void *data, size_t len)
return memcpy(xmallocz(len), data, len);
 }

-int secure_foo(struct user *u)
-{
-   if (!u->is_allowed_foo)
-   return;
-   foo(u);
-}
-
 char *xstrndup(const char *str, size_t len)
 {
char *p = memchr(str, '\0', len);
diff --git a/test.c b/test.c
index a95e6fe..a679c40 100644
--- a/test.c
+++ b/test.c
@@ -18,6 +18,13 @@ ssize_t pread_in_full(int fd, void *buf, size_t count, 
off_t offset)
return total;
 }

+int secure_foo(struct user *u)
+{
+   foo(u);
+   if (!u->is_allowed_foo)
+   return;
+}
+
 int xdup(int fd)
 {
int ret = dup(fd);

If the moved code is larger, it is easier to hide some permutation in the
code, which is why we would not want to color all lines as "moved" in this
case. So we do not just need to color lines differently that are added and
removed in the same diff, we need to tweak the algorithm a bit more.

As the reviewers attention should be brought to the places, where the
difference is introduced to the moved code, we cannot just have one new
color for all of moved code.

First I implemented an alternative design, which would show a moved hunk
in one color, and its boundaries in another color. This idea was error
prone as it inspected each line and its neighboring lines to determine
if the line was (a) moved and (b) if was deep inside a hunk by having
matching neighboring lines. This is unreliable as the we can construct
hunks which have equal neighbors that just exceed the number of lines
inspected. See one of the tests as an example.

Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then switches color to the alternative color
for the next hunk. By doing this any permutation is recognized and
displayed. That implies that there is no dedicated boundary or
inside-hunk color, but instead we'll have just two colors alternating
for hunks.

It would be a bit more UX friendly if the two corresponding hunks
(of added and deleted lines) for one move would get the same color id.
(Both get "regular moved" or "alternative moved"). This problem is
deferred to a later patch for now.

Algorithm-by: Jonathan Tan 
Signed-off-by: Stefan Beller 
---
 Documentation/config.txt   |  12 +-
 diff.c | 265 ++---
 diff.h |  21 +++-
 t/t4015-diff-whitespace.sh | 229 +++
 4 files changed, 506 insertions(+), 21 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d51..90403c06e3 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1051,14 +1051,22 @@ This does not affect linkgit:git-format-patch[1] or the
 'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
 command line with the `--color[=]` option.
 
+color.moved::
+   A boolean value, whether a diff should color moved lines
+   differently. The moved lines are searched for in the diff only.
+   Duplicated lines from somewhere in the

[PATCH 16/19] diff.c: convert diff_summary to use emit_line_*

2017-05-13 Thread Stefan Beller

In a later patch, I want to propose an option to detect
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers diff_summary.

Signed-off-by: Stefan Beller 
---
 diff.c | 62 +-
 1 file changed, 33 insertions(+), 29 deletions(-)

diff --git a/diff.c b/diff.c
index 386b28cf47..899dc69dff 100644
--- a/diff.c
+++ b/diff.c
@@ -4504,67 +4504,71 @@ static void flush_one_pair(struct diff_filepair *p, 
struct diff_options *opt)
}
 }
 
-static void show_file_mode_name(FILE *file, const char *newdelete, struct 
diff_filespec *fs)
+static void show_file_mode_name(struct diff_options *opt, const char 
*newdelete, struct diff_filespec *fs)
 {
+   struct strbuf sb = STRBUF_INIT;
if (fs->mode)
-   fprintf(file, " %s mode %06o ", newdelete, fs->mode);
+   strbuf_addf(, " %s mode %06o ", newdelete, fs->mode);
else
-   fprintf(file, " %s ", newdelete);
-   write_name_quoted(fs->path, file, '\n');
+   strbuf_addf(, " %s ", newdelete);
+
+   quote_c_style(fs->path, , NULL, 0);
+   strbuf_addch(, '\n');
+   emit_line(opt, NULL, NULL, sb.buf, sb.len);
+   strbuf_release();
 }
 
 
-static void show_mode_change(FILE *file, struct diff_filepair *p, int 
show_name,
-   const char *line_prefix)
+static void show_mode_change(struct diff_options *opt, struct diff_filepair *p,
+   int show_name)
 {
if (p->one->mode && p->two->mode && p->one->mode != p->two->mode) {
-   fprintf(file, "%s mode change %06o => %06o%c", line_prefix, 
p->one->mode,
-   p->two->mode, show_name ? ' ' : '\n');
+   struct strbuf sb = STRBUF_INIT;
if (show_name) {
-   write_name_quoted(p->two->path, file, '\n');
+   strbuf_addch(, ' ');
+   quote_c_style(p->two->path, , NULL, 0);
}
+   emit_line_fmt(opt, NULL, NULL,
+ " mode change %06o => %06o%s\n",
+ p->one->mode, p->two->mode,
+ show_name ? sb.buf : "");
+   strbuf_release();
}
 }
 
-static void show_rename_copy(FILE *file, const char *renamecopy, struct 
diff_filepair *p,
-   const char *line_prefix)
+static void show_rename_copy(struct diff_options *opt, const char *renamecopy,
+   struct diff_filepair *p)
 {
char *names = pprint_rename(p->one->path, p->two->path);
-
-   fprintf(file, " %s %s (%d%%)\n", renamecopy, names, 
similarity_index(p));
+   emit_line_fmt(opt, NULL, NULL, " %s %s (%d%%)\n", renamecopy, names, 
similarity_index(p));
free(names);
-   show_mode_change(file, p, 0, line_prefix);
+   show_mode_change(opt, p, 0);
 }
 
 static void diff_summary(struct diff_options *opt, struct diff_filepair *p)
 {
-   FILE *file = opt->file;
-   const char *line_prefix = diff_line_prefix(opt);
-
switch(p->status) {
case DIFF_STATUS_DELETED:
-   fputs(line_prefix, file);
-   show_file_mode_name(file, "delete", p->one);
+   show_file_mode_name(opt, "delete", p->one);
break;
case DIFF_STATUS_ADDED:
-   fputs(line_prefix, file);
-   show_file_mode_name(file, "create", p->two);
+   show_file_mode_name(opt, "create", p->two);
break;
case DIFF_STATUS_COPIED:
-   fputs(line_prefix, file);
-   show_rename_copy(file, "copy", p, line_prefix);
+   show_rename_copy(opt, "copy", p);
break;
case DIFF_STATUS_RENAMED:
-   fputs(line_prefix, file);
-   show_rename_copy(file, "rename", p, line_prefix);
+   show_rename_copy(opt, "rename", p);
break;
default:
if (p->score) {
-   fprintf(file, "%s rewrite ", line_prefix);
-   write_name_quoted(p->two->path, file, ' ');
-   fprintf(file, "(%d%%)\n", similarity_index(p));
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addstr(, " rewrite ");
+   quote_c_style(p->two->path, , NULL, 0);
+   strbuf_addf(, " (%d%%)\n", similarity_index(p));
+   emit_line(opt, NULL, NULL, sb.buf, sb.len);
}
-   show_mode_change(file, p, !p->score, line_prefix);
+

[PATCH 10/19] diff.c: convert emit_rewrite_lines to use emit_line_*

2017-05-13 Thread Stefan Beller

In a later patch, I want to propose an option to detect
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_rewrite_lines.

Signed-off-by: Stefan Beller 
---
 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index e4b46fee4f..369c804f03 100644
--- a/diff.c
+++ b/diff.c
@@ -748,7 +748,7 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
if (!endp) {
const char *context = diff_get_color(ecb->color_diff,
 DIFF_CONTEXT);
-   putc('\n', ecb->opt->file);
+   emit_line(ecb->opt, NULL, NULL, "\n", 1);
emit_line_0(ecb->opt, context, reset, '\\',
nneof, strlen(nneof));
}
-- 
2.13.0.18.g183880de0a

[PATCH 14/19] diff.c: convert word diffing to use emit_line_*

2017-05-13 Thread Stefan Beller

In a later patch, I want to propose an option to detect
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers all code related to diffing words.

Signed-off-by: Stefan Beller 
---
 diff.c | 56 
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/diff.c b/diff.c
index 91dc045a45..07041a35ad 100644
--- a/diff.c
+++ b/diff.c
@@ -886,37 +886,38 @@ struct diff_words_data {
struct diff_words_style *style;
 };
 
-static int fn_out_diff_words_write_helper(FILE *fp,
+static int fn_out_diff_words_write_helper(struct diff_options *o,
  struct diff_words_style_elem *st_el,
  const char *newline,
  size_t count, const char *buf,
  const char *line_prefix)
 {
-   int print = 0;
+   struct strbuf sb = STRBUF_INIT;
 
while (count) {
char *p = memchr(buf, '\n', count);
-   if (print)
-   fputs(line_prefix, fp);
+
if (p != buf) {
-   if (st_el->color && fputs(st_el->color, fp) < 0)
-   return -1;
-   if (fputs(st_el->prefix, fp) < 0 ||
-   fwrite(buf, p ? p - buf : count, 1, fp) != 1 ||
-   fputs(st_el->suffix, fp) < 0)
-   return -1;
-   if (st_el->color && *st_el->color
-   && fputs(GIT_COLOR_RESET, fp) < 0)
-   return -1;
+   if (st_el->color)
+   strbuf_addstr(, st_el->color);
+   strbuf_addstr(, st_el->prefix);
+   strbuf_add(, buf, p ? p - buf : count);
+   strbuf_addstr(, st_el->suffix);
+   if (st_el->color && *st_el->color)
+   strbuf_addstr(, GIT_COLOR_RESET);
}
if (!p)
-   return 0;
-   if (fputs(newline, fp) < 0)
-   return -1;
+   goto out;
+   strbuf_addstr(, newline);
+   emit_line(o, NULL, NULL, sb.buf, sb.len);
+   strbuf_reset();
count -= p + 1 - buf;
buf = p + 1;
-   print = 1;
}
+out:
+   if (sb.len)
+   emit_line(o, NULL, NULL, sb.buf, sb.len);
+   strbuf_release();
return 0;
 }
 
@@ -994,25 +995,25 @@ static void fn_out_diff_words_aux(void *priv, char *line, 
unsigned long len)
} else
plus_begin = plus_end = diff_words->plus.orig[plus_first].end;
 
-   if (color_words_output_graph_prefix(diff_words)) {
-   fputs(line_prefix, diff_words->opt->file);
-   }
+   if (color_words_output_graph_prefix(diff_words))
+   emit_line(diff_words->opt, NULL, NULL, "", 0);
+
if (diff_words->current_plus != plus_begin) {
-   fn_out_diff_words_write_helper(diff_words->opt->file,
+   fn_out_diff_words_write_helper(diff_words->opt,
>ctx, style->newline,
plus_begin - diff_words->current_plus,
diff_words->current_plus, line_prefix);
if (*(plus_begin - 1) == '\n')
-   fputs(line_prefix, diff_words->opt->file);
+   emit_line(diff_words->opt, NULL, NULL, "", 0);
}
if (minus_begin != minus_end) {
-   fn_out_diff_words_write_helper(diff_words->opt->file,
+   fn_out_diff_words_write_helper(diff_words->opt,
>old, style->newline,
minus_end - minus_begin, minus_begin,
line_prefix);
}
if (plus_begin != plus_end) {
-   fn_out_diff_words_write_helper(diff_words->opt->file,
+   fn_out_diff_words_write_helper(diff_words->opt,
>new, style->newline,
plus_end - plus_begin, plus_begin,
line_prefix);
@@ -1109,8 +1110,7 @@ static void diff_words_show(struct diff_words_data 
*diff_words)
 
/* special case: only removal */
if (!diff_words->plus.text.size) {
-   fputs(line_prefix, diff_words->opt->file);
-

[PATCH 11/19] submodule.c: convert show_submodule_summary to use emit_line_fmt

2017-05-13 Thread Stefan Beller

In a later patch, I want to propose an option to detect
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This prepares the code for submodules to go through the
emit_line function.

Signed-off-by: Stefan Beller 
---
 diff.c  | 18 +++---
 diff.h  |  5 
 submodule.c | 78 ++---
 submodule.h |  9 +++
 4 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/diff.c b/diff.c
index 369c804f03..45ec311828 100644
--- a/diff.c
+++ b/diff.c
@@ -546,15 +546,15 @@ static void emit_line_0(struct diff_options *o, const 
char *set, const char *res
fputc('\n', file);
 }
 
-static void emit_line(struct diff_options *o, const char *set, const char 
*reset,
- const char *line, int len)
+void emit_line(struct diff_options *o, const char *set, const char *reset,
+  const char *line, int len)
 {
emit_line_0(o, set, reset, 0, line, len);
 }
 
-static void emit_line_fmt(struct diff_options *o,
- const char *set, const char *reset,
- const char *fmt, ...)
+void emit_line_fmt(struct diff_options *o,
+  const char *set, const char *reset,
+  const char *fmt, ...)
 {
struct strbuf sb = STRBUF_INIT;
va_list ap;
@@ -2379,8 +2379,7 @@ static void builtin_diff(const char *name_a,
(!two->mode || S_ISGITLINK(two->mode))) {
const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-   show_submodule_summary(o->file, one->path ? one->path : 
two->path,
-   line_prefix,
+   show_submodule_summary(o, one->path ? one->path : two->path,
>oid, >oid,
two->dirty_submodule,
meta, del, add, reset);
@@ -2390,11 +2389,10 @@ static void builtin_diff(const char *name_a,
   (!two->mode || S_ISGITLINK(two->mode))) {
const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-   show_submodule_inline_diff(o->file, one->path ? one->path : 
two->path,
-   line_prefix,
+   show_submodule_inline_diff(o, one->path ? one->path : two->path,
>oid, >oid,
two->dirty_submodule,
-   meta, del, add, reset, o);
+   meta, del, add, reset);
return;
}
 
diff --git a/diff.h b/diff.h
index 5be1ee77a7..addebd5a0f 100644
--- a/diff.h
+++ b/diff.h
@@ -188,6 +188,11 @@ struct diff_options {
int diff_path_counter;
 };
 
+void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
+  const char *fmt, ...);
+void emit_line(struct diff_options *o, const char *set, const char *reset,
+  const char *line, int len);
+
 enum color_diff {
DIFF_RESET = 0,
DIFF_CONTEXT = 1,
diff --git a/submodule.c b/submodule.c
index d3299e29c0..cfad469a2f 100644
--- a/submodule.c
+++ b/submodule.c
@@ -362,8 +362,8 @@ static int prepare_submodule_summary(struct rev_info *rev, 
const char *path,
return prepare_revision_walk(rev);
 }
 
-static void print_submodule_summary(struct rev_info *rev, FILE *f,
-   const char *line_prefix,
+static void print_submodule_summary(struct rev_info *rev,
+   struct diff_options *o,
const char *del, const char *add, const char *reset)
 {
static const char format[] = "  %m %s";
@@ -375,18 +375,12 @@ static void print_submodule_summary(struct rev_info *rev, 
FILE *f,
ctx.date_mode = rev->date_mode;
ctx.output_encoding = get_log_output_encoding();
strbuf_setlen(, 0);
-   strbuf_addstr(, line_prefix);
-   if (commit->object.flags & SYMMETRIC_LEFT) {
-   if (del)
-   strbuf_addstr(, del);
-   }
-   else if (add)
-   strbuf_addstr(, add);
format_commit_message(commit, format, , );
-   if (reset)
-   strbuf_addstr(, reset);
strbuf_addch(, '\n');
-   fprintf(f, "%s", sb.buf);
+   if (commit->object.flags & SYMMETRIC_LEFT)
+   emit_line(o, del,

[PATCH 07/19] diff.c: convert fn_out_consume to use emit_line_*

2017-05-13 Thread Stefan Beller

In a later patch, I want to propose an option to detect
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers the parts of fn_out_consume.

Signed-off-by: Stefan Beller 
---
 diff.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index aef159a919..93343a9ccc 100644
--- a/diff.c
+++ b/diff.c
@@ -1289,7 +1289,6 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
const char *context = diff_get_color(ecbdata->color_diff, DIFF_CONTEXT);
const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);
struct diff_options *o = ecbdata->opt;
-   const char *line_prefix = diff_line_prefix(o);
 
o->found_changes = 1;
 
@@ -1301,14 +1300,12 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
 
if (ecbdata->label_path[0]) {
const char *name_a_tab, *name_b_tab;
-
name_a_tab = strchr(ecbdata->label_path[0], ' ') ? "\t" : "";
name_b_tab = strchr(ecbdata->label_path[1], ' ') ? "\t" : "";
-
-   fprintf(o->file, "%s%s--- %s%s%s\n",
-   line_prefix, meta, ecbdata->label_path[0], reset, 
name_a_tab);
-   fprintf(o->file, "%s%s+++ %s%s%s\n",
-   line_prefix, meta, ecbdata->label_path[1], reset, 
name_b_tab);
+   emit_line_fmt(o, meta, reset, "--- %s%s\n",
+ ecbdata->label_path[0], name_a_tab);
+   emit_line_fmt(o, meta, reset, "+++ %s%s\n",
+ ecbdata->label_path[1], name_b_tab);
ecbdata->label_path[0] = ecbdata->label_path[1] = NULL;
}
 
@@ -1349,7 +1346,7 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
diff_words_flush(ecbdata);
if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
emit_line(o, context, reset, line, len);
-   fputs("~\n", o->file);
+   emit_line(o, NULL, NULL, "~\n", 2);
} else {
/*
 * Skip the prefix character, if any.  With
-- 
2.13.0.18.g183880de0a

[PATCH 17/19] diff.c: factor out emit_line_ws for coloring whitespaces

2017-05-13 Thread Stefan Beller

Introduce a helper that calls ws_check_emit. We'll have it as a helper
as in a later patch we'll add buffering in there.

In a later patch we want to buffer up all output and to do that
we'll need to keep around information for outputting a line such
as the whitespace information. We choose to put this information into
a new struct 'buffered_filepair', which later will be persisting longer
than builtin_diff.

Signed-off-by: Stefan Beller 
---
 diff.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 899dc69dff..08dcc56bb9 100644
--- a/diff.c
+++ b/diff.c
@@ -552,6 +552,16 @@ void emit_line(struct diff_options *o, const char *set, 
const char *reset,
emit_line_0(o, set, reset, 0, line, len);
 }
 
+static void emit_line_ws(struct diff_options *o,
+const char *set, const char *reset, int sign,
+const char *line, int len,
+const char *ws, unsigned ws_rule)
+{
+   emit_line_0(o, set, reset, sign, "", 0);
+   ws_check_emit(line, len, ws_rule,
+ o->file, set, reset, ws);
+}
+
 void emit_line_fmt(struct diff_options *o,
   const char *set, const char *reset,
   const char *fmt, ...)
@@ -598,12 +608,10 @@ static void emit_line_checked(const char *reset,
else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len))
/* Blank line at EOF - paint '+' as well */
emit_line_0(ecbdata->opt, ws, reset, sign, line, len);
-   else {
+   else
/* Emit just the prefix, then the rest. */
-   emit_line_0(ecbdata->opt, set, reset, sign, "", 0);
-   ws_check_emit(line, len, ecbdata->ws_rule,
- ecbdata->opt->file, set, reset, ws);
-   }
+   emit_line_ws(ecbdata->opt, set, reset, sign, line, len,
+ws, ecbdata->ws_rule);
 }
 
 static void emit_add_line(const char *reset,
-- 
2.13.0.18.g183880de0a

[PATCH 02/19] diff: move line ending check into emit_hunk_header

2017-05-13 Thread Stefan Beller

In a later patch, I want to propose an option to detect
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This patch moves code that is conceptually part of
emit_hunk_header, but was using output in fn_out_consume,
back to emit_hunk_header.

Meanwhile simplify it by using a function that is designed for it.

Signed-off-by: Stefan Beller 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 3f5bf8b5a4..c2ed605cd0 100644
--- a/diff.c
+++ b/diff.c
@@ -677,6 +677,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
}
 
strbuf_add(, line + len, org_len - len);
+   strbuf_complete_line();
+
emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
strbuf_release();
 }
@@ -1315,8 +1317,6 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
len = sane_truncate_line(ecbdata, line, len);
find_lno(line, ecbdata);
emit_hunk_header(ecbdata, line, len);
-   if (line[len-1] != '\n')
-   putc('\n', o->file);
return;
}
 
-- 
2.13.0.18.g183880de0a

Re: checkout -b remotes/origin/ should not work

2017-05-13 Thread Jeff King

On Sat, May 13, 2017 at 08:52:37PM -0700, Stefan Beller wrote:

> NEEDSWORK:
> 
> checkout -b remotes/origin/ should not work, unless force is
> given (maybe?)
> 
> (I just run into that, now I have a remote tracking branch that points
> at my detached HEAD. Oh well.)

To be pedantic, you have a local branch with a funny name that points to
your detached HEAD. :)

I think this problem extends beyond "remotes/". The worst is:

  git checkout -b HEAD

but there are many confusing variants:

  git checkout -b refs/heads/foo
  git checkout -b tags/v1.0

etc. Those are all perfectly legal names, but almost certainly not what
the user intended. I think the plumbing should continue to allow them,
but I wouldn't object to some common-sense think-o protections in the
"checkout -b" plumbing (especially if it could be disabled for power
users).

-Peff

Re: [PATCH] interpret-trailers: obey scissors lines

2017-05-13 Thread Jeff King

On Sat, May 13, 2017 at 08:39:23PM -0700, Brian Malehorn wrote:

> If a commit message is being editted as "verbose", it will contain a
> scissors string ("-- >8 --") and a diff:
> 
> my subject
> 
> #  >8 
> # Do not touch the line above.
> # Everything below will be removed.
> diff --git a/foo.txt b/foo.txt
> index 5716ca5..7601807 100644
> --- a/foo.txt
> +++ b/foo.txt
> @@ -1 +1 @@
> -bar
> +baz
> 
> interpret-trailers doesn't interpret the scissors and therefore places
> trailer information after the diff. A simple reproduction is:
> 
> git config commit.verbose true
> GIT_EDITOR='git interpret-trailers --in-place --trailer Acked-by:me' \
> git commit --amend
> 
> This commit resolves the issue by teaching "git interpret-trailers" to
> obey scissors the same way "git commit" does.

Overall, this patch looks good to me. A few comments below.

The commit message explains the situation much better than the original.

> diff --git a/builtin/commit.c b/builtin/commit.c
> index 2de5f6cc6..2ce9c339d 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1735,7 +1735,8 @@ int cmd_commit(int argc, const char **argv, const char 
> *prefix)
>  
>   if (verbose || /* Truncate the message just before the diff, if any. */
>   cleanup_mode == CLEANUP_SCISSORS)
> - wt_status_truncate_message_at_cut_line();
> + strbuf_setlen(,
> +   wt_status_last_nonscissors_index(sb.buf, sb.len));

This hunk surprised me at first (that we would need to touch commit.c at
all), but the refactoring makes sense.

> @@ -1662,8 +1663,9 @@ int ignore_non_trailer(const char *buf, size_t len)
>   int boc = 0;
>   int bol = 0;
>   int in_old_conflicts_block = 0;
> + size_t cutoff = wt_status_last_nonscissors_index(buf, len);
>  
> - while (bol < len) {
> + while (bol < cutoff) {
>   const char *next_line = memchr(buf + bol, '\n', len - bol);
>  
>   if (!next_line)
> @@ -1689,5 +1691,5 @@ int ignore_non_trailer(const char *buf, size_t len)
>   }
>   bol = next_line - buf;
>   }
> - return boc ? len - boc : 0;
> + return boc ? len - boc : len - cutoff;
>  }

The change to interpret-trailers here ended up delightfully simple (and
looks right to me).

> diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
> index 4dd1d7c52..d88d4a4ff 100755
> --- a/t/t7513-interpret-trailers.sh
> +++ b/t/t7513-interpret-trailers.sh
> @@ -1258,4 +1258,21 @@ test_expect_success 'with no command and no key' '
>   test_cmp expected actual
>  '
>  
> +test_expect_success 'with scissors' '
> + cat >expected <<-EOF &&
> + my subject
> +
> + review: Brian
> + sign: A U Thor 
> + #  >8 
> + ignore this
> + EOF

Two minor style nits. One, we'd usually use "\EOF" here unless you
really do want to interpolate inside the here document. And two, we
usually indent the contents to the same level as the outer cat/EOF pair
(I actually don't mind at all how yours looks, but I just happened to
notice that it is slightly unlike our usual style).

> diff --git a/wt-status.c b/wt-status.c
> index 4bb46781c..8b807d11f 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -883,17 +883,18 @@ static void wt_longstatus_print_other(struct wt_status 
> *s,
>   status_printf_ln(s, GIT_COLOR_NORMAL, "%s", "");
>  }
>  
> -void wt_status_truncate_message_at_cut_line(struct strbuf *buf)
> +size_t wt_status_last_nonscissors_index(const char *s, size_t len)

I can see how the refactor changes this function (and it makes sense),
but the "last nonscissors index" seems like a funny term. It is really
the length, isn't, and therefore one past the last nonscissors index (or
another way of putting it: it's the first index of the scissors).

I wonder if it makes sense to call it "length".

Another way to think of it is still as a truncation. Our strip_suffix()
helper behaves quite similarly to this (not actually writing into the
buffer, but returning the new length). Perhaps something like
"wt_status_strip_scissors" would work.

-Peff

checkout -b remotes/origin/ should not work

2017-05-13 Thread Stefan Beller

NEEDSWORK:

checkout -b remotes/origin/ should not work, unless force is
given (maybe?)

(I just run into that, now I have a remote tracking branch that points
at my detached HEAD. Oh well.)

Re: [PATCH 0/3] interpret-trailers + commit -v bugfix

2017-05-13 Thread Brian Malehorn


> As I said, I'm a little iffy on doing this unconditionally, but it may
> be the least-bad solution. I'd just worry about collateral damage to
> somebody who doesn't use commit.verbose, but has something scissors-like
> in their commit message.
> 
> If you were to switch out is_scissors_line() for checking the exact
> cut_line[] from wt-status.c, I think that would be a big improvement.
> We'd still have the possibility of a false positive, but it would be
> much less likely in practice.

Yes, you're probably right. Using is_scissors_line() was the path of
least resistance to fix my bug, but wasn't really the Right Thing.

I've made a new patch that uses the wt-status.c code to determine
scissors lines. "git interpret-trailers" now uses the same logic as
"git commit". This patch replaces the original 3.

And yeah, this is yet another heuristic in interpret-trailers aimed at
git commit messages. But it's hardly the first heuristic we've added,
and I'd say it makes more sense for interpret-trailers and commit to
parse the same format.

[PATCH] interpret-trailers: obey scissors lines

2017-05-13 Thread Brian Malehorn

If a commit message is being editted as "verbose", it will contain a
scissors string ("-- >8 --") and a diff:

my subject

#  >8 
# Do not touch the line above.
# Everything below will be removed.
diff --git a/foo.txt b/foo.txt
index 5716ca5..7601807 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1 +1 @@
-bar
+baz

interpret-trailers doesn't interpret the scissors and therefore places
trailer information after the diff. A simple reproduction is:

git config commit.verbose true
GIT_EDITOR='git interpret-trailers --in-place --trailer Acked-by:me' \
git commit --amend

This commit resolves the issue by teaching "git interpret-trailers" to
obey scissors the same way "git commit" does.
---
 builtin/commit.c  |  3 ++-
 commit.c  |  6 --
 t/t7513-interpret-trailers.sh | 17 +
 wt-status.c   | 11 ++-
 wt-status.h   |  2 +-
 5 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 2de5f6cc6..2ce9c339d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1735,7 +1735,8 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
 
if (verbose || /* Truncate the message just before the diff, if any. */
cleanup_mode == CLEANUP_SCISSORS)
-   wt_status_truncate_message_at_cut_line();
+   strbuf_setlen(,
+ wt_status_last_nonscissors_index(sb.buf, sb.len));
 
if (cleanup_mode != CLEANUP_NONE)
strbuf_stripspace(, cleanup_mode == CLEANUP_ALL);
diff --git a/commit.c b/commit.c
index fab826973..5d791b703 100644
--- a/commit.c
+++ b/commit.c
@@ -11,6 +11,7 @@
 #include "commit-slab.h"
 #include "prio-queue.h"
 #include "sha1-lookup.h"
+#include "wt-status.h"
 
 static struct commit_extra_header *read_commit_extra_header_lines(const char 
*buf, size_t len, const char **);
 
@@ -1662,8 +1663,9 @@ int ignore_non_trailer(const char *buf, size_t len)
int boc = 0;
int bol = 0;
int in_old_conflicts_block = 0;
+   size_t cutoff = wt_status_last_nonscissors_index(buf, len);
 
-   while (bol < len) {
+   while (bol < cutoff) {
const char *next_line = memchr(buf + bol, '\n', len - bol);
 
if (!next_line)
@@ -1689,5 +1691,5 @@ int ignore_non_trailer(const char *buf, size_t len)
}
bol = next_line - buf;
}
-   return boc ? len - boc : 0;
+   return boc ? len - boc : len - cutoff;
 }
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 4dd1d7c52..d88d4a4ff 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1258,4 +1258,21 @@ test_expect_success 'with no command and no key' '
test_cmp expected actual
 '
 
+test_expect_success 'with scissors' '
+   cat >expected <<-EOF &&
+   my subject
+
+   review: Brian
+   sign: A U Thor 
+   #  >8 
+   ignore this
+   EOF
+   git interpret-trailers --trailer review:Brian >actual <<-EOF &&
+   my subject
+   #  >8 
+   ignore this
+   EOF
+   test_cmp expected actual
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 4bb46781c..8b807d11f 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -883,17 +883,18 @@ static void wt_longstatus_print_other(struct wt_status *s,
status_printf_ln(s, GIT_COLOR_NORMAL, "%s", "");
 }
 
-void wt_status_truncate_message_at_cut_line(struct strbuf *buf)
+size_t wt_status_last_nonscissors_index(const char *s, size_t len)
 {
const char *p;
struct strbuf pattern = STRBUF_INIT;
 
strbuf_addf(, "\n%c %s", comment_line_char, cut_line);
-   if (starts_with(buf->buf, pattern.buf + 1))
-   strbuf_setlen(buf, 0);
-   else if ((p = strstr(buf->buf, pattern.buf)))
-   strbuf_setlen(buf, p - buf->buf + 1);
+   if (starts_with(s, pattern.buf + 1))
+   len = 0;
+   else if ((p = strstr(s, pattern.buf)))
+   len = p - s + 1;
strbuf_release();
+   return len;
 }
 
 void wt_status_add_cut_line(FILE *fp)
diff --git a/wt-status.h b/wt-status.h
index 54fec7703..36a21b492 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -112,7 +112,7 @@ struct wt_status_state {
unsigned char cherry_pick_head_sha1[20];
 };
 
-void wt_status_truncate_message_at_cut_line(struct strbuf *);
+size_t wt_status_last_nonscissors_index(const char *s, size_t len);
 void wt_status_add_cut_line(FILE *fp);
 void wt_status_prepare(struct wt_status *s);
 void wt_status_print(struct wt_status *s);
-- 
2.12.3.9.g050893b

[RFC PATCH v2 22/22] blame: create entry prepend function in libgit

2017-05-13 Thread Jeff Smith

Signed-off-by: Jeff Smith 
---
 blame.c | 16 
 blame.h |  2 ++
 builtin/blame.c | 11 +--
 3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/blame.c b/blame.c
index f6c9cb7..00404b9 100644
--- a/blame.c
+++ b/blame.c
@@ -1845,3 +1845,19 @@ void setup_scoreboard(struct blame_scoreboard *sb, const 
char *path, struct blam
if (orig)
*orig = o;
 }
+
+
+
+struct blame_entry *blame_entry_prepend(struct blame_entry *head,
+   long start, long end,
+   struct blame_origin *o)
+{
+   struct blame_entry *new_head = xcalloc(1, sizeof(struct blame_entry));
+   new_head->lno = start;
+   new_head->num_lines = end - start;
+   new_head->suspect = o;
+   new_head->s_lno = start;
+   new_head->next = head;
+   blame_origin_incref(o);
+   return new_head;
+}
diff --git a/blame.h b/blame.h
index 76fd8ef..a6c915c 100644
--- a/blame.h
+++ b/blame.h
@@ -170,4 +170,6 @@ extern const char *blame_nth_line(struct blame_scoreboard 
*sb, long lno);
 extern void init_scoreboard(struct blame_scoreboard *sb);
 extern void setup_scoreboard(struct blame_scoreboard *sb, const char *path, 
struct blame_origin **orig);
 
+extern struct blame_entry *blame_entry_prepend(struct blame_entry *head, long 
start, long end, struct blame_origin *o);
+
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index 7ec5a73..8a858b0 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -890,16 +890,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
for (range_i = ranges.nr; range_i > 0; --range_i) {
const struct range *r = [range_i - 1];
-   long bottom = r->start;
-   long top = r->end;
-   struct blame_entry *next = ent;
-   ent = xcalloc(1, sizeof(*ent));
-   ent->lno = bottom;
-   ent->num_lines = top - bottom;
-   ent->suspect = o;
-   ent->s_lno = bottom;
-   ent->next = next;
-   blame_origin_incref(o);
+   ent = blame_entry_prepend(ent, r->start, r->end, o);
}
 
o->suspects = ent;
-- 
2.9.3

[RFC PATCH v2 12/22] blame: move no_whole_file_rename flag to scoreboard

2017-05-13 Thread Jeff Smith

The no_whole_file_rename flag is used in parts of blame that are being
moved to libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 blame.h | 1 +
 builtin/blame.c | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/blame.h b/blame.h
index 42a948d..c140f41 100644
--- a/blame.h
+++ b/blame.h
@@ -133,6 +133,7 @@ struct blame_scoreboard {
int reverse;
int show_root;
int xdl_opts;
+   int no_whole_file_rename;
 };
 
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index d3af02b..bd295eb 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1285,7 +1285,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
 * The first pass looks for unrenamed path to optimize for
 * common cases, then we look for renames in the second pass.
 */
-   for (pass = 0; pass < 2 - no_whole_file_rename; pass++) {
+   for (pass = 0; pass < 2 - sb->no_whole_file_rename; pass++) {
struct blame_origin *(*find)(struct commit *, struct 
blame_origin *);
find = pass ? find_rename : find_origin;
 
@@ -2763,6 +2763,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
sb.show_root = show_root;
sb.xdl_opts = xdl_opts;
+   sb.no_whole_file_rename = no_whole_file_rename;
 
read_mailmap(, NULL);
 
-- 
2.9.3

[RFC PATCH v2 16/22] blame: rework methods that determine 'final' commit

2017-05-13 Thread Jeff Smith

Either prepare_initial or prepare_final is used to determine which
commit is marked as 'final'. Call the underlying methods directly to
make this more clear.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 49 +++--
 1 file changed, 23 insertions(+), 26 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 94c9ded..3d85f62 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2297,14 +2297,8 @@ static struct commit *find_single_final(struct rev_info 
*revs,
return found;
 }
 
-static char *prepare_final(struct blame_scoreboard *sb)
-{
-   const char *name;
-   sb->final = find_single_final(sb->revs, );
-   return xstrdup_or_null(name);
-}
-
-static const char *dwim_reverse_initial(struct blame_scoreboard *sb)
+static struct commit *dwim_reverse_initial(struct rev_info *revs,
+  const char **name_p)
 {
/*
 * DWIM "git blame --reverse ONE -- PATH" as
@@ -2315,11 +2309,11 @@ static const char *dwim_reverse_initial(struct 
blame_scoreboard *sb)
struct commit *head_commit;
unsigned char head_sha1[20];
 
-   if (sb->revs->pending.nr != 1)
+   if (revs->pending.nr != 1)
return NULL;
 
/* Is that sole rev a committish? */
-   obj = sb->revs->pending.objects[0].item;
+   obj = revs->pending.objects[0].item;
obj = deref_tag(obj, NULL, 0);
if (obj->type != OBJ_COMMIT)
return NULL;
@@ -2333,17 +2327,19 @@ static const char *dwim_reverse_initial(struct 
blame_scoreboard *sb)
 
/* Turn "ONE" into "ONE..HEAD" then */
obj->flags |= UNINTERESTING;
-   add_pending_object(sb->revs, _commit->object, "HEAD");
+   add_pending_object(revs, _commit->object, "HEAD");
 
-   sb->final = (struct commit *)obj;
-   return sb->revs->pending.objects[0].name;
+   if (name_p)
+   *name_p = revs->pending.objects[0].name;
+   return (struct commit *)obj;
 }
 
-static char *prepare_initial(struct blame_scoreboard *sb)
+static struct commit *find_single_initial(struct rev_info *revs,
+ const char **name_p)
 {
int i;
const char *final_commit_name = NULL;
-   struct rev_info *revs = sb->revs;
+   struct commit *found = NULL;
 
/*
 * There must be one and only one negative commit, and it must be
@@ -2356,19 +2352,22 @@ static char *prepare_initial(struct blame_scoreboard 
*sb)
obj = deref_tag(obj, NULL, 0);
if (obj->type != OBJ_COMMIT)
die("Non commit %s?", revs->pending.objects[i].name);
-   if (sb->final)
+   if (found)
die("More than one commit to dig up from, %s and %s?",
revs->pending.objects[i].name,
final_commit_name);
-   sb->final = (struct commit *) obj;
+   found = (struct commit *) obj;
final_commit_name = revs->pending.objects[i].name;
}
 
if (!final_commit_name)
-   final_commit_name = dwim_reverse_initial(sb);
+   found = dwim_reverse_initial(revs, _commit_name);
if (!final_commit_name)
die("No commit to dig up from?");
-   return xstrdup(final_commit_name);
+
+   if (name_p)
+   *name_p = final_commit_name;
+   return found;
 }
 
 static int blame_copy_callback(const struct option *option, const char *arg, 
int unset)
@@ -2412,7 +2411,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
struct blame_origin *o;
struct blame_entry *ent = NULL;
long dashdash_pos, lno;
-   char *final_commit_name = NULL;
+   const char *final_commit_name = NULL;
enum object_type type;
struct commit *final_commit = NULL;
struct progress_info pi = { NULL, 0 };
@@ -2621,14 +2620,15 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
sb.revs = 
sb.contents_from = contents_from;
sb.reverse = reverse;
+
if (!reverse) {
-   final_commit_name = prepare_final();
+   sb.final = find_single_final(, _commit_name);
sb.commits.compare = compare_commits_by_commit_date;
}
else if (contents_from)
die(_("--contents and --reverse do not blend well."));
else {
-   final_commit_name = prepare_initial();
+   sb.final = find_single_initial(, _commit_name);
sb.commits.compare = compare_commits_by_reverse_commit_date;
if (revs.first_parent_only)
revs.children.name = NULL;
@@ -2783,10 +2783,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
if (!incremental)
setup_pager();
-
-   free(final_commit_name);

[RFC PATCH v2 14/22] blame: move progess updates to a scoreboard callback

2017-05-13 Thread Jeff Smith

Allow the interface user to decide how to handle a progress update.

Signed-off-by: Jeff Smith 
---
 blame.h |  3 +++
 builtin/blame.c | 24 ++--
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/blame.h b/blame.h
index e85db06..a0bd91b 100644
--- a/blame.h
+++ b/blame.h
@@ -138,6 +138,9 @@ struct blame_scoreboard {
 
/* callbacks */
void(*on_sanity_fail)(struct blame_scoreboard *, int);
+   void(*found_guilty_entry)(struct blame_entry *, void *);
+
+   void *found_guilty_entry_data;
 };
 
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index d7b3b5a..a40f743 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1598,9 +1598,10 @@ static int emit_one_suspect_detail(struct blame_origin 
*suspect, int repeat)
  * The blame_entry is found to be guilty for the range.
  * Show it in incremental output.
  */
-static void found_guilty_entry(struct blame_entry *ent,
-  struct progress_info *pi)
+static void found_guilty_entry(struct blame_entry *ent, void *data)
 {
+   struct progress_info *pi = (struct progress_info *)data;
+
if (incremental) {
struct blame_origin *suspect = ent->suspect;
 
@@ -1623,11 +1624,6 @@ static void assign_blame(struct blame_scoreboard *sb, 
int opt)
 {
struct rev_info *revs = sb->revs;
struct commit *commit = prio_queue_get(>commits);
-   struct progress_info pi = { NULL, 0 };
-
-   if (show_progress)
-   pi.progress = start_progress_delay(_("Blaming lines"),
-  sb->num_lines, 50, 1);
 
while (commit) {
struct blame_entry *ent;
@@ -1669,7 +1665,8 @@ static void assign_blame(struct blame_scoreboard *sb, int 
opt)
suspect->guilty = 1;
for (;;) {
struct blame_entry *next = ent->next;
-   found_guilty_entry(ent, );
+   if (sb->found_guilty_entry)
+   sb->found_guilty_entry(ent, 
sb->found_guilty_entry_data);
if (next) {
ent = next;
continue;
@@ -1685,8 +1682,6 @@ static void assign_blame(struct blame_scoreboard *sb, int 
opt)
if (sb->debug) /* sanity */
sanity_check_refcnt(sb);
}
-
-   stop_progress();
 }
 
 static const char *format_time(unsigned long time, const char *tz_str,
@@ -2419,6 +2414,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
char *final_commit_name = NULL;
enum object_type type;
struct commit *final_commit = NULL;
+   struct progress_info pi = { NULL, 0 };
 
struct string_list range_list = STRING_LIST_INIT_NODUP;
int output_option = 0, opt = 0;
@@ -2774,8 +2770,16 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
read_mailmap(, NULL);
 
+   sb.found_guilty_entry = _guilty_entry;
+   sb.found_guilty_entry_data = 
+   if (show_progress)
+   pi.progress = start_progress_delay(_("Blaming lines"),
+  sb.num_lines, 50, 1);
+
assign_blame(, opt);
 
+   stop_progress();
+
if (!incremental)
setup_pager();
 
-- 
2.9.3

[RFC PATCH v2 15/22] blame: wrap blame_sort and compare_blame_final

2017-05-13 Thread Jeff Smith

The new method's interface is marginally cleaner than blame_sort, and
will avoid the need to expose the compare_blame_final method.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index a40f743..94c9ded 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -249,12 +249,6 @@ static int compare_blame_suspect(const void *p1, const 
void *p2)
return s1->s_lno > s2->s_lno ? 1 : -1;
 }
 
-static struct blame_entry *blame_sort(struct blame_entry *head,
- int (*compare_fn)(const void *, const 
void *))
-{
-   return llist_mergesort (head, get_next_blame, set_next_blame, 
compare_fn);
-}
-
 static int compare_commits_by_reverse_commit_date(const void *a,
  const void *b,
  void *c)
@@ -262,6 +256,12 @@ static int compare_commits_by_reverse_commit_date(const 
void *a,
return -compare_commits_by_commit_date(a, b, c);
 }
 
+static void blame_sort_final(struct blame_scoreboard *sb)
+{
+   sb->ent = llist_mergesort(sb->ent, get_next_blame, set_next_blame,
+ compare_blame_final);
+}
+
 static void sanity_check_refcnt(struct blame_scoreboard *);
 
 /*
@@ -1244,7 +1244,8 @@ static int num_scapegoats(struct rev_info *revs, struct 
commit *commit, int reve
  */
 static void distribute_blame(struct blame_scoreboard *sb, struct blame_entry 
*blamed)
 {
-   blamed = blame_sort(blamed, compare_blame_suspect);
+   blamed = llist_mergesort(blamed, get_next_blame, set_next_blame,
+compare_blame_suspect);
while (blamed)
{
struct blame_origin *porigin = blamed->suspect;
@@ -2788,7 +2789,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
if (incremental)
return 0;
 
-   sb.ent = blame_sort(sb.ent, compare_blame_final);
+   blame_sort_final();
 
coalesce();
 
-- 
2.9.3

[RFC PATCH v2 17/22] blame: move origin-related methods to libgit

2017-05-13 Thread Jeff Smith

Signed-off-by: Jeff Smith 
---
 Makefile|   1 +
 blame.c |  62 +
 blame.h |  15 +++
 builtin/blame.c | 120 
 4 files changed, 102 insertions(+), 96 deletions(-)
 create mode 100644 blame.c

diff --git a/Makefile b/Makefile
index e35542e..2d795ed 100644
--- a/Makefile
+++ b/Makefile
@@ -718,6 +718,7 @@ LIB_OBJS += argv-array.o
 LIB_OBJS += attr.o
 LIB_OBJS += base85.o
 LIB_OBJS += bisect.o
+LIB_OBJS += blame.o
 LIB_OBJS += blob.o
 LIB_OBJS += branch.o
 LIB_OBJS += bulk-checkin.o
diff --git a/blame.c b/blame.c
new file mode 100644
index 000..4855d6d
--- /dev/null
+++ b/blame.c
@@ -0,0 +1,62 @@
+#include "blame.h"
+
+void blame_origin_decref(struct blame_origin *o)
+{
+   if (o && --o->refcnt <= 0) {
+   struct blame_origin *p, *l = NULL;
+   if (o->previous)
+   blame_origin_decref(o->previous);
+   free(o->file.ptr);
+   /* Should be present exactly once in commit chain */
+   for (p = o->commit->util; p; l = p, p = p->next) {
+   if (p == o) {
+   if (l)
+   l->next = p->next;
+   else
+   o->commit->util = p->next;
+   free(o);
+   return;
+   }
+   }
+   die("internal error in blame_origin_decref");
+   }
+}
+
+/*
+ * Given a commit and a path in it, create a new origin structure.
+ * The callers that add blame to the scoreboard should use
+ * get_origin() to obtain shared, refcounted copy instead of calling
+ * this function directly.
+ */
+struct blame_origin *make_origin(struct commit *commit, const char *path)
+{
+   struct blame_origin *o;
+   FLEX_ALLOC_STR(o, path, path);
+   o->commit = commit;
+   o->refcnt = 1;
+   o->next = commit->util;
+   commit->util = o;
+   return o;
+}
+
+/*
+ * Locate an existing origin or create a new one.
+ * This moves the origin to front position in the commit util list.
+ */
+struct blame_origin *get_origin(struct commit *commit, const char *path)
+{
+   struct blame_origin *o, *l;
+
+   for (o = commit->util, l = NULL; o; l = o, o = o->next) {
+   if (!strcmp(o->path, path)) {
+   /* bump to front */
+   if (l) {
+   l->next = o->next;
+   o->next = commit->util;
+   commit->util = o;
+   }
+   return blame_origin_incref(o);
+   }
+   }
+   return make_origin(commit, path);
+}
diff --git a/blame.h b/blame.h
index a0bd91b..39f8865 100644
--- a/blame.h
+++ b/blame.h
@@ -143,4 +143,19 @@ struct blame_scoreboard {
void *found_guilty_entry_data;
 };
 
+/*
+ * Origin is refcounted and usually we keep the blob contents to be
+ * reused.
+ */
+static inline struct blame_origin *blame_origin_incref(struct blame_origin *o)
+{
+   if (o)
+   o->refcnt++;
+   return o;
+}
+extern void blame_origin_decref(struct blame_origin *o);
+
+extern struct blame_origin *make_origin(struct commit *commit, const char 
*path);
+extern struct blame_origin *get_origin(struct commit *commit, const char 
*path);
+
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index 3d85f62..b60d604 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -122,39 +122,6 @@ static void fill_origin_blob(struct diff_options *opt,
*file = o->file;
 }
 
-/*
- * Origin is refcounted and usually we keep the blob contents to be
- * reused.
- */
-static inline struct blame_origin *origin_incref(struct blame_origin *o)
-{
-   if (o)
-   o->refcnt++;
-   return o;
-}
-
-static void origin_decref(struct blame_origin *o)
-{
-   if (o && --o->refcnt <= 0) {
-   struct blame_origin *p, *l = NULL;
-   if (o->previous)
-   origin_decref(o->previous);
-   free(o->file.ptr);
-   /* Should be present exactly once in commit chain */
-   for (p = o->commit->util; p; l = p, p = p->next) {
-   if (p == o) {
-   if (l)
-   l->next = p->next;
-   else
-   o->commit->util = p->next;
-   free(o);
-   return;
-   }
-   }
-   die("internal error in blame::origin_decref");
-   }
-}
-
 static void drop_origin_blob(struct blame_origin *o)
 {
if (o->file.ptr) {
@@ -278,7 +245,7 @@ static void coalesce(struct blame_scoreboard *sb)

[RFC PATCH v2 20/22] blame: create scoreboard init function in libgit

2017-05-13 Thread Jeff Smith

Signed-off-by: Jeff Smith 
---
 blame.c | 7 +++
 blame.h | 2 ++
 builtin/blame.c | 4 +---
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/blame.c b/blame.c
index 798e61b..17ebf64 100644
--- a/blame.c
+++ b/blame.c
@@ -1574,3 +1574,10 @@ void assign_blame(struct blame_scoreboard *sb, int opt)
sanity_check_refcnt(sb);
}
 }
+
+void init_scoreboard(struct blame_scoreboard *sb)
+{
+   memset(sb, 0, sizeof(struct blame_scoreboard));
+   sb->move_score = BLAME_DEFAULT_MOVE_SCORE;
+   sb->copy_score = BLAME_DEFAULT_COPY_SCORE;
+}
diff --git a/blame.h b/blame.h
index a3ea677..9477b6e 100644
--- a/blame.h
+++ b/blame.h
@@ -171,4 +171,6 @@ extern unsigned blame_entry_score(struct blame_scoreboard 
*sb, struct blame_entr
 extern void assign_blame(struct blame_scoreboard *sb, int opt);
 extern const char *blame_nth_line(struct blame_scoreboard *sb, long lno);
 
+extern void init_scoreboard(struct blame_scoreboard *sb);
+
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index 207a74d..e01265a 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1026,10 +1026,8 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
revs.disable_stdin = 1;
setup_revisions(argc, argv, , NULL);
-   memset(, 0, sizeof(sb));
-   sb.move_score = BLAME_DEFAULT_MOVE_SCORE;
-   sb.copy_score = BLAME_DEFAULT_COPY_SCORE;
 
+   init_scoreboard();
sb.revs = 
sb.contents_from = contents_from;
sb.reverse = reverse;
-- 
2.9.3

[RFC PATCH v2 13/22] blame: make sanity_check use a callback in scoreboard

2017-05-13 Thread Jeff Smith

Allow the interface user to decide how to handle a failed sanity check,
whether that be to output with the current state or to do nothing.

Signed-off-by: Jeff Smith 
---
 blame.h |  4 
 builtin/blame.c | 23 +++
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/blame.h b/blame.h
index c140f41..e85db06 100644
--- a/blame.h
+++ b/blame.h
@@ -134,6 +134,10 @@ struct blame_scoreboard {
int show_root;
int xdl_opts;
int no_whole_file_rename;
+   int debug;
+
+   /* callbacks */
+   void(*on_sanity_fail)(struct blame_scoreboard *, int);
 };
 
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index bd295eb..d7b3b5a 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -285,7 +285,7 @@ static void coalesce(struct blame_scoreboard *sb)
}
}
 
-   if (DEBUG) /* sanity */
+   if (sb->debug) /* sanity */
sanity_check_refcnt(sb);
 }
 
@@ -1682,7 +1682,7 @@ static void assign_blame(struct blame_scoreboard *sb, int 
opt)
}
origin_decref(suspect);
 
-   if (DEBUG) /* sanity */
+   if (sb->debug) /* sanity */
sanity_check_refcnt(sb);
}
 
@@ -2021,12 +2021,16 @@ static void sanity_check_refcnt(struct blame_scoreboard 
*sb)
baa = 1;
}
}
-   if (baa) {
-   int opt = 0160;
-   find_alignment(sb, );
-   output(sb, opt);
-   die("Baa %d!", baa);
-   }
+   if (baa)
+   sb->on_sanity_fail(sb, baa);
+}
+
+static void sanity_check_on_fail(struct blame_scoreboard *sb, int baa)
+{
+   int opt = OUTPUT_SHOW_SCORE | OUTPUT_SHOW_NUMBER | OUTPUT_SHOW_NAME;
+   find_alignment(sb, );
+   output(sb, opt);
+   die("Baa %d!", baa);
 }
 
 static unsigned parse_score(const char *arg)
@@ -2761,6 +2765,9 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
if (blame_copy_score)
sb.copy_score = blame_copy_score;
 
+   sb.debug = DEBUG;
+   sb.on_sanity_fail = _check_on_fail;
+
sb.show_root = show_root;
sb.xdl_opts = xdl_opts;
sb.no_whole_file_rename = no_whole_file_rename;
-- 
2.9.3

[RFC PATCH v2 18/22] blame: move fake-commit-related methods to libgit

2017-05-13 Thread Jeff Smith

Signed-off-by: Jeff Smith 
---
 blame.c | 203 +++-
 blame.h |   4 +-
 builtin/blame.c | 197 --
 3 files changed, 205 insertions(+), 199 deletions(-)

diff --git a/blame.c b/blame.c
index 4855d6d..8c025bc 100644
--- a/blame.c
+++ b/blame.c
@@ -1,3 +1,6 @@
+#include "cache.h"
+#include "refs.h"
+#include "cache-tree.h"
 #include "blame.h"
 
 void blame_origin_decref(struct blame_origin *o)
@@ -28,7 +31,7 @@ void blame_origin_decref(struct blame_origin *o)
  * get_origin() to obtain shared, refcounted copy instead of calling
  * this function directly.
  */
-struct blame_origin *make_origin(struct commit *commit, const char *path)
+static struct blame_origin *make_origin(struct commit *commit, const char 
*path)
 {
struct blame_origin *o;
FLEX_ALLOC_STR(o, path, path);
@@ -60,3 +63,201 @@ struct blame_origin *get_origin(struct commit *commit, 
const char *path)
}
return make_origin(commit, path);
 }
+
+
+
+static void verify_working_tree_path(struct commit *work_tree, const char 
*path)
+{
+   struct commit_list *parents;
+   int pos;
+
+   for (parents = work_tree->parents; parents; parents = parents->next) {
+   const struct object_id *commit_oid = >item->object.oid;
+   struct object_id blob_oid;
+   unsigned mode;
+
+   if (!get_tree_entry(commit_oid->hash, path, blob_oid.hash, 
) &&
+   sha1_object_info(blob_oid.hash, NULL) == OBJ_BLOB)
+   return;
+   }
+
+   pos = cache_name_pos(path, strlen(path));
+   if (pos >= 0)
+   ; /* path is in the index */
+   else if (-1 - pos < active_nr &&
+!strcmp(active_cache[-1 - pos]->name, path))
+   ; /* path is in the index, unmerged */
+   else
+   die("no such path '%s' in HEAD", path);
+}
+
+static struct commit_list **append_parent(struct commit_list **tail, const 
struct object_id *oid)
+{
+   struct commit *parent;
+
+   parent = lookup_commit_reference(oid->hash);
+   if (!parent)
+   die("no such commit %s", oid_to_hex(oid));
+   return _list_insert(parent, tail)->next;
+}
+
+static void append_merge_parents(struct commit_list **tail)
+{
+   int merge_head;
+   struct strbuf line = STRBUF_INIT;
+
+   merge_head = open(git_path_merge_head(), O_RDONLY);
+   if (merge_head < 0) {
+   if (errno == ENOENT)
+   return;
+   die("cannot open '%s' for reading", git_path_merge_head());
+   }
+
+   while (!strbuf_getwholeline_fd(, merge_head, '\n')) {
+   struct object_id oid;
+   if (line.len < GIT_SHA1_HEXSZ || get_oid_hex(line.buf, ))
+   die("unknown line in '%s': %s", git_path_merge_head(), 
line.buf);
+   tail = append_parent(tail, );
+   }
+   close(merge_head);
+   strbuf_release();
+}
+
+/*
+ * This isn't as simple as passing sb->buf and sb->len, because we
+ * want to transfer ownership of the buffer to the commit (so we
+ * must use detach).
+ */
+static void set_commit_buffer_from_strbuf(struct commit *c, struct strbuf *sb)
+{
+   size_t len;
+   void *buf = strbuf_detach(sb, );
+   set_commit_buffer(c, buf, len);
+}
+
+/*
+ * Prepare a dummy commit that represents the work tree (or staged) item.
+ * Note that annotating work tree item never works in the reverse.
+ */
+struct commit *fake_working_tree_commit(struct diff_options *opt,
+   const char *path,
+   const char *contents_from)
+{
+   struct commit *commit;
+   struct blame_origin *origin;
+   struct commit_list **parent_tail, *parent;
+   struct object_id head_oid;
+   struct strbuf buf = STRBUF_INIT;
+   const char *ident;
+   time_t now;
+   int size, len;
+   struct cache_entry *ce;
+   unsigned mode;
+   struct strbuf msg = STRBUF_INIT;
+
+   read_cache();
+   time();
+   commit = alloc_commit_node();
+   commit->object.parsed = 1;
+   commit->date = now;
+   parent_tail = >parents;
+
+   if (!resolve_ref_unsafe("HEAD", RESOLVE_REF_READING, head_oid.hash, 
NULL))
+   die("no such ref: HEAD");
+
+   parent_tail = append_parent(parent_tail, _oid);
+   append_merge_parents(parent_tail);
+   verify_working_tree_path(commit, path);
+
+   origin = make_origin(commit, path);
+
+   ident = fmt_ident("Not Committed Yet", "not.committed.yet", NULL, 0);
+   strbuf_addstr(, "tree \n");
+   for (parent = commit->parents; parent; parent = parent->next)
+   strbuf_addf(, "parent %s\n",
+   oid_to_hex(>item->object.oid));
+   strbuf_addf(,
+

[RFC PATCH v2 04/22] blame: move origin and entry structures to header

2017-05-13 Thread Jeff Smith

The origin and blame_entry structures are core to the blame interface
and reference each other. Since origin will be more exposed, rename it
to blame_origin to clarify what it is a part of.

Signed-off-by: Jeff Smith 
---
 blame.h |  86 ++
 builtin/blame.c | 185 +---
 2 files changed, 140 insertions(+), 131 deletions(-)
 create mode 100644 blame.h

diff --git a/blame.h b/blame.h
new file mode 100644
index 000..f52d0fc
--- /dev/null
+++ b/blame.h
@@ -0,0 +1,86 @@
+#ifndef BLAME_H
+#define BLAME_H
+
+#include "cache.h"
+#include "commit.h"
+#include "xdiff-interface.h"
+
+/*
+ * One blob in a commit that is being suspected
+ */
+struct blame_origin {
+   int refcnt;
+   /* Record preceding blame record for this blob */
+   struct blame_origin *previous;
+   /* origins are put in a list linked via `next' hanging off the
+* corresponding commit's util field in order to make finding
+* them fast.  The presence in this chain does not count
+* towards the origin's reference count.  It is tempting to
+* let it count as long as the commit is pending examination,
+* but even under circumstances where the commit will be
+* present multiple times in the priority queue of unexamined
+* commits, processing the first instance will not leave any
+* work requiring the origin data for the second instance.  An
+* interspersed commit changing that would have to be
+* preexisting with a different ancestry and with the same
+* commit date in order to wedge itself between two instances
+* of the same commit in the priority queue _and_ produce
+* blame entries relevant for it.  While we don't want to let
+* us get tripped up by this case, it certainly does not seem
+* worth optimizing for.
+*/
+   struct blame_origin *next;
+   struct commit *commit;
+   /* `suspects' contains blame entries that may be attributed to
+* this origin's commit or to parent commits.  When a commit
+* is being processed, all suspects will be moved, either by
+* assigning them to an origin in a different commit, or by
+* shipping them to the scoreboard's ent list because they
+* cannot be attributed to a different commit.
+*/
+   struct blame_entry *suspects;
+   mmfile_t file;
+   struct object_id blob_oid;
+   unsigned mode;
+   /* guilty gets set when shipping any suspects to the final
+* blame list instead of other commits
+*/
+   char guilty;
+   char path[FLEX_ARRAY];
+};
+
+/*
+ * Each group of lines is described by a blame_entry; it can be split
+ * as we pass blame to the parents.  They are arranged in linked lists
+ * kept as `suspects' of some unprocessed origin, or entered (when the
+ * blame origin has been finalized) into the scoreboard structure.
+ * While the scoreboard structure is only sorted at the end of
+ * processing (according to final image line number), the lists
+ * attached to an origin are sorted by the target line number.
+ */
+struct blame_entry {
+   struct blame_entry *next;
+
+   /* the first line of this group in the final image;
+* internally all line numbers are 0 based.
+*/
+   int lno;
+
+   /* how many lines this group has */
+   int num_lines;
+
+   /* the commit that introduced this group into the final image */
+   struct blame_origin *suspect;
+
+   /* the line number of the first line of this group in the
+* suspect's file; internally all line numbers are 0 based.
+*/
+   int s_lno;
+
+   /* how significant this entry is -- cached to avoid
+* scanning the lines over and over.
+*/
+   unsigned score;
+};
+
+#endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index e30b3ef..caa5ee0 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -28,6 +28,7 @@
 #include "line-log.h"
 #include "dir.h"
 #include "progress.h"
+#include "blame.h"
 
 static char blame_usage[] = N_("git blame [] [] [] 
[--] ");
 
@@ -84,50 +85,6 @@ static unsigned blame_copy_score;
 #define METAINFO_SHOWN (1u<<12)
 #define MORE_THAN_ONE_PATH (1u<<13)
 
-/*
- * One blob in a commit that is being suspected
- */
-struct origin {
-   int refcnt;
-   /* Record preceding blame record for this blob */
-   struct origin *previous;
-   /* origins are put in a list linked via `next' hanging off the
-* corresponding commit's util field in order to make finding
-* them fast.  The presence in this chain does not count
-* towards the origin's reference count.  It is tempting to
-* let it count as long as the commit is pending examination,
-* but even under circumstances where the commit will be
-* present multiple times in the

[RFC PATCH v2 08/22] blame: move contents_from to scoreboard

2017-05-13 Thread Jeff Smith

The argument from --contents is used in parts of blame that are being
moved to libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 blame.h | 3 +++
 builtin/blame.c | 1 +
 2 files changed, 4 insertions(+)

diff --git a/blame.h b/blame.h
index fde7d1d..388309d 100644
--- a/blame.h
+++ b/blame.h
@@ -125,6 +125,9 @@ struct blame_scoreboard {
 */
unsigned move_score;
unsigned copy_score;
+
+   /* use this file's contents as the final image */
+   const char *contents_from;
 };
 
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index 949e179..032fd15 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2617,6 +2617,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
sb.copy_score = BLAME_DEFAULT_COPY_SCORE;
 
sb.revs = 
+   sb.contents_from = contents_from;
if (!reverse) {
final_commit_name = prepare_final();
sb.commits.compare = compare_commits_by_commit_date;
-- 
2.9.3

[RFC PATCH v2 07/22] blame: move copy/move thresholds to scoreboard

2017-05-13 Thread Jeff Smith

Copy and move score thresholds are used in parts of blame that are being
moved to libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 blame.h | 10 ++
 builtin/blame.c | 36 
 2 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/blame.h b/blame.h
index 3438052..fde7d1d 100644
--- a/blame.h
+++ b/blame.h
@@ -7,6 +7,9 @@
 #include "revision.h"
 #include "prio-queue.h"
 
+#define BLAME_DEFAULT_MOVE_SCORE   20
+#define BLAME_DEFAULT_COPY_SCORE   40
+
 /*
  * One blob in a commit that is being suspected
  */
@@ -115,6 +118,13 @@ struct blame_scoreboard {
int num_read_blob;
int num_get_patch;
int num_commits;
+
+   /*
+* blame for a blame_entry with score lower than these thresholds
+* is not passed to the parent using move/copy logic.
+*/
+   unsigned move_score;
+   unsigned copy_score;
 };
 
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index 12add06..949e179 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -67,14 +67,8 @@ static struct string_list mailmap = STRING_LIST_INIT_NODUP;
 #define PICKAXE_BLAME_COPY_HARDER  04
 #define PICKAXE_BLAME_COPY_HARDEST 010
 
-/*
- * blame for a blame_entry with score lower than these thresholds
- * is not passed to the parent using move/copy logic.
- */
 static unsigned blame_move_score;
 static unsigned blame_copy_score;
-#define BLAME_DEFAULT_MOVE_SCORE   20
-#define BLAME_DEFAULT_COPY_SCORE   40
 
 /* Remember to update object flag allocation in object.h */
 #define METAINFO_SHOWN (1u<<12)
@@ -1047,7 +1041,7 @@ static void find_move_in_parent(struct blame_scoreboard 
*sb,
next = e->next;
find_copy_in_blob(sb, e, parent, split, _p);
if (split[1].suspect &&
-   blame_move_score < ent_score(sb, [1])) {
+   sb->move_score < ent_score(sb, [1])) {
split_blame(blamed, , split, e);
} else {
e->next = leftover;
@@ -1056,7 +1050,7 @@ static void find_move_in_parent(struct blame_scoreboard 
*sb,
decref_split(split);
}
*unblamedtail = NULL;
-   toosmall = filter_small(sb, toosmall, , 
blame_move_score);
+   toosmall = filter_small(sb, toosmall, , 
sb->move_score);
} while (unblamed);
target->suspects = reverse_blame(leftover, NULL);
 }
@@ -1177,7 +1171,7 @@ static void find_copy_in_parent(struct blame_scoreboard 
*sb,
for (j = 0; j < num_ents; j++) {
struct blame_entry *split = blame_list[j].split;
if (split[1].suspect &&
-   blame_copy_score < ent_score(sb, [1])) {
+   sb->copy_score < ent_score(sb, [1])) {
split_blame(blamed, , split,
blame_list[j].ent);
} else {
@@ -1188,7 +1182,7 @@ static void find_copy_in_parent(struct blame_scoreboard 
*sb,
}
free(blame_list);
*unblamedtail = NULL;
-   toosmall = filter_small(sb, toosmall, , 
blame_copy_score);
+   toosmall = filter_small(sb, toosmall, , 
sb->copy_score);
} while (unblamed);
target->suspects = reverse_blame(leftover, NULL);
diff_flush(_opts);
@@ -1345,7 +1339,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
 * Optionally find moves in parents' files.
 */
if (opt & PICKAXE_BLAME_MOVE) {
-   filter_small(sb, , >suspects, 
blame_move_score);
+   filter_small(sb, , >suspects, sb->move_score);
if (origin->suspects) {
for (i = 0, sg = first_scapegoat(revs, commit);
 i < num_sg && sg;
@@ -1364,12 +1358,12 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
 * Optionally find copies from parents' files.
 */
if (opt & PICKAXE_BLAME_COPY) {
-   if (blame_copy_score > blame_move_score)
-   filter_small(sb, , >suspects, 
blame_copy_score);
-   else if (blame_copy_score < blame_move_score) {
+   if (sb->copy_score > sb->move_score)
+   filter_small(sb, , >suspects, 
sb->copy_score);
+   else if (sb->copy_score < sb->move_score) {
origin->suspects = blame_merge(origin->suspects, 
toosmall);
toosmall = NULL;
-   filter_small(sb, , >suspects, 
blame_copy_score);
+   filter_small(sb, , >suspects, 
sb->copy_score);

[RFC PATCH v2 11/22] blame: move xdl_opts flags to scoreboard

2017-05-13 Thread Jeff Smith

The xdl_opts flags are used in parts of blame that are being moved to
libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 blame.h | 1 +
 builtin/blame.c | 7 ---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/blame.h b/blame.h
index 7691256..42a948d 100644
--- a/blame.h
+++ b/blame.h
@@ -132,6 +132,7 @@ struct blame_scoreboard {
/* flags */
int reverse;
int show_root;
+   int xdl_opts;
 };
 
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index 77fb71f..d3af02b 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -80,7 +80,7 @@ struct progress_info {
 };
 
 static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
- xdl_emit_hunk_consume_func_t hunk_func, void *cb_data)
+ xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, 
int xdl_opts)
 {
xpparam_t xpp = {0};
xdemitconf_t xecfg = {0};
@@ -823,7 +823,7 @@ static void pass_blame_to_parent(struct blame_scoreboard 
*sb,
fill_origin_blob(>revs->diffopt, target, _o, 
>num_read_blob);
sb->num_get_patch++;
 
-   if (diff_hunks(_p, _o, blame_chunk_cb, ))
+   if (diff_hunks(_p, _o, blame_chunk_cb, , sb->xdl_opts))
die("unable to generate diff (%s -> %s)",
oid_to_hex(>commit->object.oid),
oid_to_hex(>commit->object.oid));
@@ -972,7 +972,7 @@ static void find_copy_in_blob(struct blame_scoreboard *sb,
 * file_p partially may match that image.
 */
memset(split, 0, sizeof(struct blame_entry [3]));
-   if (diff_hunks(file_p, _o, handle_split_cb, ))
+   if (diff_hunks(file_p, _o, handle_split_cb, , sb->xdl_opts))
die("unable to generate diff (%s)",
oid_to_hex(>commit->object.oid));
/* remainder, if any, all match the preimage */
@@ -2762,6 +2762,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
sb.copy_score = blame_copy_score;
 
sb.show_root = show_root;
+   sb.xdl_opts = xdl_opts;
 
read_mailmap(, NULL);
 
-- 
2.9.3

[RFC PATCH v2 05/22] blame: move scoreboard structure to header

2017-05-13 Thread Jeff Smith

The scoreboard structure is core to the blame interface. Since
scoreboard will be more exposed, rename it to blame_scoreboard to
clarify what it is a part of.

Signed-off-by: Jeff Smith 
---
 blame.h | 29 
 builtin/blame.c | 83 +++--
 2 files changed, 57 insertions(+), 55 deletions(-)

diff --git a/blame.h b/blame.h
index f52d0fc..8d6a6fd 100644
--- a/blame.h
+++ b/blame.h
@@ -4,6 +4,8 @@
 #include "cache.h"
 #include "commit.h"
 #include "xdiff-interface.h"
+#include "revision.h"
+#include "prio-queue.h"
 
 /*
  * One blob in a commit that is being suspected
@@ -83,4 +85,31 @@ struct blame_entry {
unsigned score;
 };
 
+/*
+ * The current state of the blame assignment.
+ */
+struct blame_scoreboard {
+   /* the final commit (i.e. where we started digging from) */
+   struct commit *final;
+   /* Priority queue for commits with unassigned blame records */
+   struct prio_queue commits;
+   struct rev_info *revs;
+   const char *path;
+
+   /*
+* The contents in the final image.
+* Used by many functions to obtain contents of the nth line,
+* indexed with scoreboard.lineno[blame_entry.lno].
+*/
+   const char *final_buf;
+   unsigned long final_buf_size;
+
+   /* linked list of blames */
+   struct blame_entry *ent;
+
+   /* look-up a line in the final buffer */
+   int num_lines;
+   int *lineno;
+};
+
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index caa5ee0..4a1da53 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -273,41 +273,14 @@ static int compare_commits_by_reverse_commit_date(const 
void *a,
return -compare_commits_by_commit_date(a, b, c);
 }
 
-/*
- * The current state of the blame assignment.
- */
-struct scoreboard {
-   /* the final commit (i.e. where we started digging from) */
-   struct commit *final;
-   /* Priority queue for commits with unassigned blame records */
-   struct prio_queue commits;
-   struct rev_info *revs;
-   const char *path;
-
-   /*
-* The contents in the final image.
-* Used by many functions to obtain contents of the nth line,
-* indexed with scoreboard.lineno[blame_entry.lno].
-*/
-   const char *final_buf;
-   unsigned long final_buf_size;
-
-   /* linked list of blames */
-   struct blame_entry *ent;
-
-   /* look-up a line in the final buffer */
-   int num_lines;
-   int *lineno;
-};
-
-static void sanity_check_refcnt(struct scoreboard *);
+static void sanity_check_refcnt(struct blame_scoreboard *);
 
 /*
  * If two blame entries that are next to each other came from
  * contiguous lines in the same origin (i.e.  pair),
  * merge them together.
  */
-static void coalesce(struct scoreboard *sb)
+static void coalesce(struct blame_scoreboard *sb)
 {
struct blame_entry *ent, *next;
 
@@ -333,7 +306,7 @@ static void coalesce(struct scoreboard *sb)
  * the commit priority queue of the score board.
  */
 
-static void queue_blames(struct scoreboard *sb, struct blame_origin *porigin,
+static void queue_blames(struct blame_scoreboard *sb, struct blame_origin 
*porigin,
 struct blame_entry *sorted)
 {
if (porigin->suspects)
@@ -576,14 +549,14 @@ static void dup_entry(struct blame_entry ***queue,
*queue = >next;
 }
 
-static const char *nth_line(struct scoreboard *sb, long lno)
+static const char *nth_line(struct blame_scoreboard *sb, long lno)
 {
return sb->final_buf + sb->lineno[lno];
 }
 
 static const char *nth_line_cb(void *data, long lno)
 {
-   return nth_line((struct scoreboard *)data, lno);
+   return nth_line((struct blame_scoreboard *)data, lno);
 }
 
 /*
@@ -842,7 +815,7 @@ static int blame_chunk_cb(long start_a, long count_a,
  * for the lines it is suspected to its parent.  Run diff to find
  * which lines came from parent and pass blame for them.
  */
-static void pass_blame_to_parent(struct scoreboard *sb,
+static void pass_blame_to_parent(struct blame_scoreboard *sb,
 struct blame_origin *target,
 struct blame_origin *parent)
 {
@@ -882,7 +855,7 @@ static void pass_blame_to_parent(struct scoreboard *sb,
  *
  * Compute how trivial the lines in the blame_entry are.
  */
-static unsigned ent_score(struct scoreboard *sb, struct blame_entry *e)
+static unsigned ent_score(struct blame_scoreboard *sb, struct blame_entry *e)
 {
unsigned score;
const char *cp, *ep;
@@ -909,7 +882,7 @@ static unsigned ent_score(struct scoreboard *sb, struct 
blame_entry *e)
  * so far, by comparing this and best_so_far and copying this into
  * bst_so_far as needed.
  */
-static void copy_split_if_better(struct scoreboard *sb,
+static void copy_split_if_better(struct blame_scoreboard *sb,

[RFC PATCH v2 10/22] blame: move show_root flag to scoreboard

2017-05-13 Thread Jeff Smith

The show_root flag is used in parts of blame that are being moved to
libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 blame.h | 1 +
 builtin/blame.c | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/blame.h b/blame.h
index 85703dd..7691256 100644
--- a/blame.h
+++ b/blame.h
@@ -131,6 +131,7 @@ struct blame_scoreboard {
 
/* flags */
int reverse;
+   int show_root;
 };
 
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index ed50eda..77fb71f 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1660,7 +1660,7 @@ static void assign_blame(struct blame_scoreboard *sb, int 
opt)
mark_parents_uninteresting(commit);
}
/* treat root commit as boundary */
-   if (!commit->parents && !show_root)
+   if (!commit->parents && !sb->show_root)
commit->object.flags |= UNINTERESTING;
 
/* Take responsibility for the remaining entries */
@@ -2761,6 +2761,8 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
if (blame_copy_score)
sb.copy_score = blame_copy_score;
 
+   sb.show_root = show_root;
+
read_mailmap(, NULL);
 
assign_blame(, opt);
-- 
2.9.3

[RFC PATCH v2 19/22] blame: move scoreboard-related methods to libgit

2017-05-13 Thread Jeff Smith

Signed-off-by: Jeff Smith 
---
 blame.c | 1313 ++
 blame.h |   11 +
 builtin/blame.c | 1330 +--
 3 files changed, 1330 insertions(+), 1324 deletions(-)

diff --git a/blame.c b/blame.c
index 8c025bc..798e61b 100644
--- a/blame.c
+++ b/blame.c
@@ -1,6 +1,9 @@
 #include "cache.h"
 #include "refs.h"
 #include "cache-tree.h"
+#include "mergesort.h"
+#include "diff.h"
+#include "diffcore.h"
 #include "blame.h"
 
 void blame_origin_decref(struct blame_origin *o)
@@ -261,3 +264,1313 @@ struct commit *fake_working_tree_commit(struct 
diff_options *opt,
 
return commit;
 }
+
+
+
+static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
+ xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, 
int xdl_opts)
+{
+   xpparam_t xpp = {0};
+   xdemitconf_t xecfg = {0};
+   xdemitcb_t ecb = {NULL};
+
+   xpp.flags = xdl_opts;
+   xecfg.hunk_func = hunk_func;
+   ecb.priv = cb_data;
+   return xdi_diff(file_a, file_b, , , );
+}
+
+/*
+ * Given an origin, prepare mmfile_t structure to be used by the
+ * diff machinery
+ */
+static void fill_origin_blob(struct diff_options *opt,
+struct blame_origin *o, mmfile_t *file, int 
*num_read_blob)
+{
+   if (!o->file.ptr) {
+   enum object_type type;
+   unsigned long file_size;
+
+   (*num_read_blob)++;
+   if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) &&
+   textconv_object(o->path, o->mode, >blob_oid, 1, 
>ptr, _size))
+   ;
+   else
+   file->ptr = read_sha1_file(o->blob_oid.hash, ,
+  _size);
+   file->size = file_size;
+
+   if (!file->ptr)
+   die("Cannot read blob %s for path %s",
+   oid_to_hex(>blob_oid),
+   o->path);
+   o->file = *file;
+   }
+   else
+   *file = o->file;
+}
+
+static void drop_origin_blob(struct blame_origin *o)
+{
+   if (o->file.ptr) {
+   free(o->file.ptr);
+   o->file.ptr = NULL;
+   }
+}
+
+/*
+ * Any merge of blames happens on lists of blames that arrived via
+ * different parents in a single suspect.  In this case, we want to
+ * sort according to the suspect line numbers as opposed to the final
+ * image line numbers.  The function body is somewhat longish because
+ * it avoids unnecessary writes.
+ */
+
+static struct blame_entry *blame_merge(struct blame_entry *list1,
+  struct blame_entry *list2)
+{
+   struct blame_entry *p1 = list1, *p2 = list2,
+   **tail = 
+
+   if (!p1)
+   return p2;
+   if (!p2)
+   return p1;
+
+   if (p1->s_lno <= p2->s_lno) {
+   do {
+   tail = >next;
+   if ((p1 = *tail) == NULL) {
+   *tail = p2;
+   return list1;
+   }
+   } while (p1->s_lno <= p2->s_lno);
+   }
+   for (;;) {
+   *tail = p2;
+   do {
+   tail = >next;
+   if ((p2 = *tail) == NULL)  {
+   *tail = p1;
+   return list1;
+   }
+   } while (p1->s_lno > p2->s_lno);
+   *tail = p1;
+   do {
+   tail = >next;
+   if ((p1 = *tail) == NULL) {
+   *tail = p2;
+   return list1;
+   }
+   } while (p1->s_lno <= p2->s_lno);
+   }
+}
+
+static void *get_next_blame(const void *p)
+{
+   return ((struct blame_entry *)p)->next;
+}
+
+static void set_next_blame(void *p1, void *p2)
+{
+   ((struct blame_entry *)p1)->next = p2;
+}
+
+/*
+ * Final image line numbers are all different, so we don't need a
+ * three-way comparison here.
+ */
+
+static int compare_blame_final(const void *p1, const void *p2)
+{
+   return ((struct blame_entry *)p1)->lno > ((struct blame_entry *)p2)->lno
+   ? 1 : -1;
+}
+
+static int compare_blame_suspect(const void *p1, const void *p2)
+{
+   const struct blame_entry *s1 = p1, *s2 = p2;
+   /*
+* to allow for collating suspects, we sort according to the
+* respective pointer value as the primary sorting criterion.
+* The actual relation is pretty unimportant as long as it
+* establishes a total order.  Comparing as integers gives us
+* that.
+*/
+   if (s1->suspect != s2->suspect)
+   return (intptr_t)s1->suspect > (intptr_t)s2->suspect ? 1 : -1;
+   if (s1->s_lno ==

[RFC PATCH v2 02/22] blame: move textconv_object with related functions

2017-05-13 Thread Jeff Smith

textconv_object is used in places other than blame.c and should be moved
to a more appropriate location.  Other textconv related functions are
located in diff.c so that seems as good a place as any.

Signed-off-by: Jeff Smith 
---
 builtin.h  |  2 --
 builtin/blame.c| 28 
 builtin/cat-file.c |  1 +
 diff.c | 23 +++
 diff.h |  7 +++
 5 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/builtin.h b/builtin.h
index 9e4a898..498ac80 100644
--- a/builtin.h
+++ b/builtin.h
@@ -25,8 +25,6 @@ struct fmt_merge_msg_opts {
 extern int fmt_merge_msg(struct strbuf *in, struct strbuf *out,
 struct fmt_merge_msg_opts *);
 
-extern int textconv_object(const char *path, unsigned mode, const struct 
object_id *oid, int oid_valid, char **buf, unsigned long *buf_size);
-
 extern int is_builtin(const char *s);
 
 extern int cmd_add(int argc, const char **argv, const char *prefix);
diff --git a/builtin/blame.c b/builtin/blame.c
index 42c56eb..c419981 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -147,34 +147,6 @@ static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
 }
 
 /*
- * Prepare diff_filespec and convert it using diff textconv API
- * if the textconv driver exists.
- * Return 1 if the conversion succeeds, 0 otherwise.
- */
-int textconv_object(const char *path,
-   unsigned mode,
-   const struct object_id *oid,
-   int oid_valid,
-   char **buf,
-   unsigned long *buf_size)
-{
-   struct diff_filespec *df;
-   struct userdiff_driver *textconv;
-
-   df = alloc_filespec(path);
-   fill_filespec(df, oid->hash, oid_valid, mode);
-   textconv = get_textconv(df);
-   if (!textconv) {
-   free_filespec(df);
-   return 0;
-   }
-
-   *buf_size = fill_textconv(textconv, df, buf);
-   free_filespec(df);
-   return 1;
-}
-
-/*
  * Given an origin, prepare mmfile_t structure to be used by the
  * diff machinery
  */
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 1890d7a..79a2c82 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -5,6 +5,7 @@
  */
 #include "cache.h"
 #include "builtin.h"
+#include "diff.h"
 #include "parse-options.h"
 #include "userdiff.h"
 #include "streaming.h"
diff --git a/diff.c b/diff.c
index 74283d9..040fb25 100644
--- a/diff.c
+++ b/diff.c
@@ -5270,6 +5270,29 @@ size_t fill_textconv(struct userdiff_driver *driver,
return size;
 }
 
+int textconv_object(const char *path,
+   unsigned mode,
+   const struct object_id *oid,
+   int oid_valid,
+   char **buf,
+   unsigned long *buf_size)
+{
+   struct diff_filespec *df;
+   struct userdiff_driver *textconv;
+
+   df = alloc_filespec(path);
+   fill_filespec(df, oid->hash, oid_valid, mode);
+   textconv = get_textconv(df);
+   if (!textconv) {
+   free_filespec(df);
+   return 0;
+   }
+
+   *buf_size = fill_textconv(textconv, df, buf);
+   free_filespec(df);
+   return 1;
+}
+
 void setup_diff_pager(struct diff_options *opt)
 {
/*
diff --git a/diff.h b/diff.h
index 5be1ee7..52ebd54 100644
--- a/diff.h
+++ b/diff.h
@@ -385,6 +385,13 @@ extern size_t fill_textconv(struct userdiff_driver *driver,
  */
 extern struct userdiff_driver *get_textconv(struct diff_filespec *one);
 
+/*
+ * Prepare diff_filespec and convert it using diff textconv API
+ * if the textconv driver exists.
+ * Return 1 if the conversion succeeds, 0 otherwise.
+ */
+extern int textconv_object(const char *path, unsigned mode, const struct 
object_id *oid, int oid_valid, char **buf, unsigned long *buf_size);
+
 extern int parse_rename_score(const char **cp_p);
 
 extern long parse_algorithm_value(const char *value);
-- 
2.9.3

[RFC PATCH v2 06/22] blame: move stat counters to scoreboard

2017-05-13 Thread Jeff Smith

Statistic counters are used in parts of blame that are being moved to
libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 blame.h |  5 +
 builtin/blame.c | 29 -
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/blame.h b/blame.h
index 8d6a6fd..3438052 100644
--- a/blame.h
+++ b/blame.h
@@ -110,6 +110,11 @@ struct blame_scoreboard {
/* look-up a line in the final buffer */
int num_lines;
int *lineno;
+
+   /* stats */
+   int num_read_blob;
+   int num_get_patch;
+   int num_commits;
 };
 
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index 4a1da53..12add06 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -62,11 +62,6 @@ static struct string_list mailmap = STRING_LIST_INIT_NODUP;
 #define DEBUG 0
 #endif
 
-/* stats */
-static int num_read_blob;
-static int num_get_patch;
-static int num_commits;
-
 #define PICKAXE_BLAME_MOVE 01
 #define PICKAXE_BLAME_COPY 02
 #define PICKAXE_BLAME_COPY_HARDER  04
@@ -108,13 +103,13 @@ static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
  * diff machinery
  */
 static void fill_origin_blob(struct diff_options *opt,
-struct blame_origin *o, mmfile_t *file)
+struct blame_origin *o, mmfile_t *file, int 
*num_read_blob)
 {
if (!o->file.ptr) {
enum object_type type;
unsigned long file_size;
 
-   num_read_blob++;
+   (*num_read_blob)++;
if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) &&
textconv_object(o->path, o->mode, >blob_oid, 1, 
>ptr, _size))
;
@@ -830,9 +825,9 @@ static void pass_blame_to_parent(struct blame_scoreboard 
*sb,
d.offset = 0;
d.dstq =  d.srcq = >suspects;
 
-   fill_origin_blob(>revs->diffopt, parent, _p);
-   fill_origin_blob(>revs->diffopt, target, _o);
-   num_get_patch++;
+   fill_origin_blob(>revs->diffopt, parent, _p, 
>num_read_blob);
+   fill_origin_blob(>revs->diffopt, target, _o, 
>num_read_blob);
+   sb->num_get_patch++;
 
if (diff_hunks(_p, _o, blame_chunk_cb, ))
die("unable to generate diff (%s -> %s)",
@@ -1036,7 +1031,7 @@ static void find_move_in_parent(struct blame_scoreboard 
*sb,
if (!unblamed)
return; /* nothing remains for this target */
 
-   fill_origin_blob(>revs->diffopt, parent, _p);
+   fill_origin_blob(>revs->diffopt, parent, _p, 
>num_read_blob);
if (!file_p.ptr)
return;
 
@@ -1165,7 +1160,7 @@ static void find_copy_in_parent(struct blame_scoreboard 
*sb,
norigin = get_origin(parent, p->one->path);
oidcpy(>blob_oid, >one->oid);
norigin->mode = p->one->mode;
-   fill_origin_blob(>revs->diffopt, norigin, _p);
+   fill_origin_blob(>revs->diffopt, norigin, _p, 
>num_read_blob);
if (!file_p.ptr)
continue;
 
@@ -1330,7 +1325,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
}
}
 
-   num_commits++;
+   sb->num_commits++;
for (i = 0, sg = first_scapegoat(revs, commit);
 i < num_sg && sg;
 sg = sg->next, i++) {
@@ -2714,7 +2709,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
oid_to_hex(>blob_oid),
path);
}
-   num_read_blob++;
+   sb.num_read_blob++;
lno = prepare_lines();
 
if (lno && !range_list.nr)
@@ -2795,9 +2790,9 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
}
 
if (show_stats) {
-   printf("num read blob: %d\n", num_read_blob);
-   printf("num get patch: %d\n", num_get_patch);
-   printf("num commits: %d\n", num_commits);
+   printf("num read blob: %d\n", sb.num_read_blob);
+   printf("num get patch: %d\n", sb.num_get_patch);
+   printf("num commits: %d\n", sb.num_commits);
}
return 0;
 }
-- 
2.9.3

[RFC PATCH v2 09/22] blame: move reverse flag to scoreboard

2017-05-13 Thread Jeff Smith

The reverse flag is used in parts of blame that are being moved to
libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 blame.h |  3 +++
 builtin/blame.c | 20 +++-
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/blame.h b/blame.h
index 388309d..85703dd 100644
--- a/blame.h
+++ b/blame.h
@@ -128,6 +128,9 @@ struct blame_scoreboard {
 
/* use this file's contents as the final image */
const char *contents_from;
+
+   /* flags */
+   int reverse;
 };
 
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index 032fd15..ed50eda 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1218,7 +1218,8 @@ static void pass_whole_blame(struct blame_scoreboard *sb,
  * "parent" (and "porigin"), but what we mean is to find scapegoat to
  * exonerate ourselves.
  */
-static struct commit_list *first_scapegoat(struct rev_info *revs, struct 
commit *commit)
+static struct commit_list *first_scapegoat(struct rev_info *revs, struct 
commit *commit,
+  int reverse)
 {
if (!reverse) {
if (revs->first_parent_only &&
@@ -1232,9 +1233,9 @@ static struct commit_list *first_scapegoat(struct 
rev_info *revs, struct commit
return lookup_decoration(>children, >object);
 }
 
-static int num_scapegoats(struct rev_info *revs, struct commit *commit)
+static int num_scapegoats(struct rev_info *revs, struct commit *commit, int 
reverse)
 {
-   struct commit_list *l = first_scapegoat(revs, commit);
+   struct commit_list *l = first_scapegoat(revs, commit, reverse);
return commit_list_count(l);
 }
 
@@ -1272,7 +1273,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
struct blame_entry *toosmall = NULL;
struct blame_entry *blames, **blametail = 
 
-   num_sg = num_scapegoats(revs, commit);
+   num_sg = num_scapegoats(revs, commit, sb->reverse);
if (!num_sg)
goto finish;
else if (num_sg < ARRAY_SIZE(sg_buf))
@@ -1288,7 +1289,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
struct blame_origin *(*find)(struct commit *, struct 
blame_origin *);
find = pass ? find_rename : find_origin;
 
-   for (i = 0, sg = first_scapegoat(revs, commit);
+   for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse);
 i < num_sg && sg;
 sg = sg->next, i++) {
struct commit *p = sg->item;
@@ -1320,7 +1321,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
}
 
sb->num_commits++;
-   for (i = 0, sg = first_scapegoat(revs, commit);
+   for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse);
 i < num_sg && sg;
 sg = sg->next, i++) {
struct blame_origin *porigin = sg_origin[i];
@@ -1341,7 +1342,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
if (opt & PICKAXE_BLAME_MOVE) {
filter_small(sb, , >suspects, sb->move_score);
if (origin->suspects) {
-   for (i = 0, sg = first_scapegoat(revs, commit);
+   for (i = 0, sg = first_scapegoat(revs, commit, 
sb->reverse);
 i < num_sg && sg;
 sg = sg->next, i++) {
struct blame_origin *porigin = sg_origin[i];
@@ -1368,7 +1369,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
if (!origin->suspects)
goto finish;
 
-   for (i = 0, sg = first_scapegoat(revs, commit);
+   for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse);
 i < num_sg && sg;
 sg = sg->next, i++) {
struct blame_origin *porigin = sg_origin[i];
@@ -1649,7 +1650,7 @@ static void assign_blame(struct blame_scoreboard *sb, int 
opt)
 */
origin_incref(suspect);
parse_commit(commit);
-   if (reverse ||
+   if (sb->reverse ||
(!(commit->object.flags & UNINTERESTING) &&
 !(revs->max_age != -1 && commit->date < revs->max_age)))
pass_blame(sb, suspect, opt);
@@ -2618,6 +2619,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
sb.revs = 
sb.contents_from = contents_from;
+   sb.reverse = reverse;
if (!reverse) {
final_commit_name = prepare_final();
sb.commits.compare = compare_commits_by_commit_date;
-- 
2.9.3

[RFC PATCH v2 00/22] Add blame to libgit

2017-05-13 Thread Jeff Smith

Rather than duplicate large portions of builtin/blame.c in cgit, it
would be better to shift its core functionality into libgit.a.  The
functionality left in builtin/blame.c mostly relates to terminal
presentation.

Since initial patchset:
  Made commit titles consistent
  Broke some commits into more atomic pieces
  Fleshed out commit message bodies
  Made public structure and method names more clearly blame-related

Jeff Smith (22):
  blame: remove unneeded dependency on blob.h
  blame: move textconv_object with related functions
  blame: remove unused parameters
  blame: move origin and entry structures to header
  blame: move scoreboard structure to header
  blame: move stat counters to scoreboard
  blame: move copy/move thresholds to scoreboard
  blame: move contents_from to scoreboard
  blame: move reverse flag to scoreboard
  blame: move show_root flag to scoreboard
  blame: move xdl_opts flags to scoreboard
  blame: move no_whole_file_rename flag to scoreboard
  blame: make sanity_check use a callback in scoreboard
  blame: move progess updates to a scoreboard callback
  blame: wrap blame_sort and compare_blame_final
  blame: rework methods that determine 'final' commit
  blame: move origin-related methods to libgit
  blame: move fake-commit-related methods to libgit
  blame: move scoreboard-related methods to libgit
  blame: create scoreboard init function in libgit
  blame: create scoreboard setup function in libgit
  blame: create entry prepend function in libgit

 Makefile   |1 +
 blame.c| 1863 +
 blame.h|  175 +
 builtin.h  |2 -
 builtin/blame.c| 2130 ++--
 builtin/cat-file.c |1 +
 diff.c |   23 +
 diff.h |7 +
 8 files changed, 2143 insertions(+), 2059 deletions(-)
 create mode 100644 blame.c
 create mode 100644 blame.h

-- 
2.9.3

[RFC PATCH v2 03/22] blame: remove unused parameters

2017-05-13 Thread Jeff Smith

Clean up blame code before moving it into libgit

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index c419981..e30b3ef 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -449,9 +449,7 @@ static struct origin *make_origin(struct commit *commit, 
const char *path)
  * Locate an existing origin or create a new one.
  * This moves the origin to front position in the commit util list.
  */
-static struct origin *get_origin(struct scoreboard *sb,
-struct commit *commit,
-const char *path)
+static struct origin *get_origin(struct commit *commit, const char *path)
 {
struct origin *o, *l;
 
@@ -499,8 +497,7 @@ static int fill_blob_sha1_and_mode(struct origin *origin)
  * We have an origin -- check if the same path exists in the
  * parent and return an origin structure to represent it.
  */
-static struct origin *find_origin(struct scoreboard *sb,
- struct commit *parent,
+static struct origin *find_origin(struct commit *parent,
  struct origin *origin)
 {
struct origin *porigin;
@@ -543,7 +540,7 @@ static struct origin *find_origin(struct scoreboard *sb,
 
if (!diff_queued_diff.nr) {
/* The path is the same as parent */
-   porigin = get_origin(sb, parent, origin->path);
+   porigin = get_origin(parent, origin->path);
oidcpy(>blob_oid, >blob_oid);
porigin->mode = origin->mode;
} else {
@@ -569,7 +566,7 @@ static struct origin *find_origin(struct scoreboard *sb,
die("internal error in blame::find_origin (%c)",
p->status);
case 'M':
-   porigin = get_origin(sb, parent, origin->path);
+   porigin = get_origin(parent, origin->path);
oidcpy(>blob_oid, >one->oid);
porigin->mode = p->one->mode;
break;
@@ -588,8 +585,7 @@ static struct origin *find_origin(struct scoreboard *sb,
  * We have an origin -- find the path that corresponds to it in its
  * parent and return an origin structure to represent it.
  */
-static struct origin *find_rename(struct scoreboard *sb,
- struct commit *parent,
+static struct origin *find_rename(struct commit *parent,
  struct origin *origin)
 {
struct origin *porigin = NULL;
@@ -615,7 +611,7 @@ static struct origin *find_rename(struct scoreboard *sb,
struct diff_filepair *p = diff_queued_diff.queue[i];
if ((p->status == 'R' || p->status == 'C') &&
!strcmp(p->two->path, origin->path)) {
-   porigin = get_origin(sb, parent, p->one->path);
+   porigin = get_origin(parent, p->one->path);
oidcpy(>blob_oid, >one->oid);
porigin->mode = p->one->mode;
break;
@@ -1270,7 +1266,7 @@ static void find_copy_in_parent(struct scoreboard *sb,
/* find_move already dealt with this path */
continue;
 
-   norigin = get_origin(sb, parent, p->one->path);
+   norigin = get_origin(parent, p->one->path);
oidcpy(>blob_oid, >one->oid);
norigin->mode = p->one->mode;
fill_origin_blob(>revs->diffopt, norigin, _p);
@@ -1404,8 +1400,7 @@ static void pass_blame(struct scoreboard *sb, struct 
origin *origin, int opt)
 * common cases, then we look for renames in the second pass.
 */
for (pass = 0; pass < 2 - no_whole_file_rename; pass++) {
-   struct origin *(*find)(struct scoreboard *,
-  struct commit *, struct origin *);
+   struct origin *(*find)(struct commit *, struct origin *);
find = pass ? find_rename : find_origin;
 
for (i = 0, sg = first_scapegoat(revs, commit);
@@ -1418,7 +1413,7 @@ static void pass_blame(struct scoreboard *sb, struct 
origin *origin, int opt)
continue;
if (parse_commit(p))
continue;
-   porigin = find(sb, p, origin);
+   porigin = find(p, origin);
if (!porigin)
continue;
if (!oidcmp(>blob_oid, >blob_oid)) {
@@ -2806,7 +2801,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
sb.final_buf_size = o->file.size;
}
else {
-   o = get_origin(, sb.final, path);
+

Re: [PATCH] usage.c: drop set_error_handle()

2017-05-13 Thread Brandon Williams

On 05/12, Jeff King wrote:
> The set_error_handle() function was introduced by 3b331e926
> (vreportf: report to arbitrary filehandles, 2015-08-11) so
> that run-command could send post-fork, pre-exec errors to
> the parent's original stderr.
> 
> That use went away in 79319b194 (run-command: eliminate
> calls to error handling functions in child, 2017-04-19),
> which pushes all of the error reporting to the parent.
> This leaves no callers of set_error_handle(). As we're not
> likely to add any new ones, let's drop it.
> 
> Signed-off-by: Jeff King 

Looks good to me!

Thanks

-- 
Brandon Williams

[PATCH v2 7/7] grep: add support for PCRE v2

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add support for v2 of the PCRE API. This is a new major version of
PCRE that came out in early 2015[1].

The regular expression syntax is the same, but while the API is
similar, pretty much every function is either renamed or takes
different arguments. Thus using it via entirely new functions makes
sense, as opposed to trying to e.g. have one compile_pcre_pattern()
that would call either PCRE v1 or v2 functions.

Git can now be compiled with either USE_LIBPCRE1=YesPlease or
USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
synonym for the former. Providing both is a compile-time error.

With earlier patches to enable JIT for PCRE v1 the performance of the
release versions of both libraries is almost exactly the same, with
PCRE v2 being around 1% slower.

However after I reported this to the pcre-dev mailing list[2] I got a
lot of help with the API use from Zoltán Herczeg, he subsequently
optimized some of the JIT functionality in v2 of the library.

Running the p7820-grep-engines.sh performance test against the latest
Subversion trunk of both, with both them and git compiled as -O3, and
the test run against linux.git, gives the following results. Just the
/perl/ tests shown:

$ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 
USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc 
NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst 
LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 
USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease 
CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst 
LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD 
p7820-grep-engines.sh
[...]
Test   HEAD~2HEAD~  
  HEAD


7820.3: perl grep how.to  0.19(0.34+0.62)   
0.18(0.39+0.57) -5.3%   0.19(0.32+0.61) +0.0%
7820.7: perl grep ^how to 0.21(0.68+0.51)   
0.21(0.64+0.54) +0.0%   0.19(0.32+0.60) -9.5%
7820.11: perl grep [how] to   0.25(0.92+0.51)   
0.26(0.93+0.49) +4.0%   0.21(0.39+0.62) -16.0%
7820.15: perl grep (e.t[^ ]*|v.ry) rare   0.26(1.18+0.42)   
0.26(1.14+0.45) +0.0%   0.20(0.49+0.57) -23.1%
7820.19: perl grep m(ú|u)lt.b(æ|y)te  0.24(0.85+0.48)   
0.23(0.92+0.41) -4.2%   0.19(0.36+0.56) -20.8%

See commit ("perf: add a performance comparison test of grep -G, -E
and -P", 2017-04-19) for further details on the machine the above test
run was executed on.

Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
mentioning p7820-grep-engines.sh for more details on the test setup.

For ease of readability, a different run just of HEAD~ (PCRE v1 with
JIT v.s. PCRE v2), again with just the /perl/ tests shown:

Test   HEAD~ HEAD

---
7820.3: perl grep how.to  0.19(0.40+0.56)   
0.19(0.34+0.59) +0.0%
7820.7: perl grep ^how to 0.21(0.64+0.54)   
0.19(0.30+0.63) -9.5%
7820.11: perl grep [how] to   0.25(0.94+0.48)   
0.21(0.38+0.62) -16.0%
7820.15: perl grep (e.t[^ ]*|v.ry) rare   0.26(1.13+0.46)   
0.20(0.48+0.58) -23.1%
7820.19: perl grep m(ú|u)lt.b(æ|y)te  0.23(0.84+0.50)   
0.18(0.29+0.63) -21.7%

I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
when it does it's around 20% faster.

A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
the compiled pattern can be shared between threads, but not some of
the JIT context, however the grep threading support does all pattern &
JIT compilation in separate threads, so this code doesn't need to
concern itself with thread safety.

See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the
initial addition of PCRE v1. This change follows some of the same
patterns it did (and which were discussed on list at the time),
e.g. mocking up types with typedef instead of ifdef-ing them out when
USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the
program, but makes the code look nicer.

1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html
2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile  |  30 +---
 configure.ac  |  77 ++-
 grep.c| 143 ++
 grep.h|  17 +++
 t/test-lib.sh |   2 +-
 5 files changed, 250 insertions(+), 19 deletions(-)

diff --git a/Makefile b/Makefile
index

[PATCH v2 2/7] grep: skip pthreads overhead when using one thread

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Skip the administrative overhead of using pthreads when only using one
thread. Instead take the non-threaded path which would be taken under
NO_PTHREADS.

The threading support was initially added in commit
5b594f457a ("Threaded grep", 2010-01-25) with a hardcoded compile-time
number of 8 threads. Later the number of threads was made configurable
in commit 89f09dd34e ("grep: add --threads= option and
grep.threads configuration", 2015-12-15).

That change did not add any special handling for --threads=1. Now we
take a slightly faster path by skipping thread handling entirely when
1 thread is requested.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 12e62fcbf3..bd008cb100 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1238,6 +1238,8 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
num_threads = GREP_NUM_THREADS_DEFAULT;
else if (num_threads < 0)
die(_("invalid number of threads specified (%d)"), num_threads);
+   if (num_threads == 1)
+   num_threads = 0;
 #else
if (num_threads)
warning(_("no threads support, ignoring --threads"));
-- 
2.11.0

[PATCH v2 5/7] grep: un-break building with PCRE < 8.32

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Amend my change earlier in this series ("grep: add support for the
PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
versions earlier than 8.32.

The JIT support was added in version 8.20 released on 2011-10-21, but
it wasn't until 8.32 released on 2012-11-30 that the fast code path to
use the JIT via pcre_jit_exec() was added[1] (see also [2]).

This means that versions 8.20 through 8.31 could still use the JIT,
but supporting it on those versions would add to the already verbose
macro soup around JIT support it, and I don't expect that the use-case
of compiling a brand new git against a 5 year old PCRE is particularly
common, and if someone does that they can just get the existing
pre-JIT slow codepath.

So just take the easy way out and disable the JIT on any version older
than 8.32.

The reason this change isn't part of the initial change PCRE JIT
support is because possibly slightly annoying someone who's bisecting
with an ancient PCRE is worth it to have a cleaner history showing
which parts of the implementation are only used for ancient PCRE
versions. This also makes it easier to revert this change if we ever
decide to stop supporting those old versions.

1. http://www.pcre.org/original/changelog.txt ("28. Introducing a
   native interface for JIT. Through this interface, the
   compiled[...]")
2. https://bugs.exim.org/show_bug.cgi?id=2121

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 8 
 grep.h | 5 +
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index accf1c45e6..81337638ca 100644
--- a/grep.c
+++ b/grep.c
@@ -351,7 +351,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
const char *error;
int erroffset;
int options = PCRE_MULTILINE;
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
int canjit;
 #endif
 
@@ -372,7 +372,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
if (!p->pcre1_extra_info && error)
die("%s", error);
 
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
pcre_config(PCRE_CONFIG_JIT, );
if (canjit == 1) {
p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
@@ -392,7 +392,7 @@ static int pcre1match(struct grep_pat *p, const char *line, 
const char *eol,
if (eflags & REG_NOTBOL)
flags |= PCRE_NOTBOL;
 
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
if (p->pcre1_jit_on)
ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
eol - line, 0, flags, ovector,
@@ -420,7 +420,7 @@ static int pcre1match(struct grep_pat *p, const char *line, 
const char *eol,
 static void free_pcre1_regexp(struct grep_pat *p)
 {
pcre_free(p->pcre1_regexp);
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
if (p->pcre1_jit_on) {
pcre_free_study(p->pcre1_extra_info);
pcre_jit_stack_free(p->pcre1_jit_stack);
diff --git a/grep.h b/grep.h
index 14f47189f9..73ef0ef8ec 100644
--- a/grep.h
+++ b/grep.h
@@ -3,6 +3,11 @@
 #include "color.h"
 #ifdef USE_LIBPCRE1
 #include 
+#ifdef PCRE_CONFIG_JIT
+#if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32
+#define GIT_PCRE1_CAN_DO_MODERN_JIT
+#endif
+#endif
 #ifndef PCRE_STUDY_JIT_COMPILE
 #define PCRE_STUDY_JIT_COMPILE 0
 #endif
-- 
2.11.0

[PATCH v2 6/7] grep: un-break building with PCRE < 8.20

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Amend my change earlier in this series ("grep: add support for the
PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
versions earlier than 8.20.

The 8.20 release was the first release to have JIT & pcre_jit_stack in
the headers, so a mock type needs to be provided for it on those
releases.

Now git should compile with all PCRE versions that it supported before
my JIT change.

I've tested it as far back as version 7.5 released on 2008-01-10, once
I got down to version 7.0 it wouldn't build anymore with GCC 7.1.1,
and I couldn't be bothered to anything older than 7.5 as I'm confident
that if the build breaks on those older versions it's not because of
my JIT change.

See the "un-break" change in this series ("grep: un-break building
with PCRE < 8.32", 2017-05-10) for why this isn't squashed into the
main PCRE JIT commit.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/grep.h b/grep.h
index 73ef0ef8ec..b7b9d487b0 100644
--- a/grep.h
+++ b/grep.h
@@ -11,6 +11,9 @@
 #ifndef PCRE_STUDY_JIT_COMPILE
 #define PCRE_STUDY_JIT_COMPILE 0
 #endif
+#if PCRE_MAJOR <= 8 && PCRE_MINOR < 20
+typedef int pcre_jit_stack;
+#endif
 #else
 typedef int pcre;
 typedef int pcre_extra;
-- 
2.11.0

[PATCH v2 4/7] grep: add support for the PCRE v1 JIT API

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Change the grep PCRE v1 code to use JIT when available. When PCRE
support was initially added in commit 63e7e9d8b6 ("git-grep: Learn
PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into
8.20 released on 2011-10-21.

Enabling JIT support usually improves performance by more than
40%. The pattern compilation times are relatively slower, but those
relative numbers are tiny, and are easily made back in all but the
most trivial cases of grep. Detailed benchmarks & overview of
compilation times is at: http://sljit.sourceforge.net/pcre.html

With this change the difference in a t/perf/p7820-grep-engines.sh run
is, with just the /perl/ tests shown:

$ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc 
NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst 
LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~ HEAD 
p7820-grep-engines.sh
Test   HEAD~ HEAD

---
7820.3: perl grep how.to  0.27(1.18+0.42)   
0.19(0.30+0.68) -29.6%
7820.7: perl grep ^how to 0.48(2.74+0.41)   
0.21(0.64+0.55) -56.2%
7820.11: perl grep [how] to   0.47(2.56+0.47)   
0.26(0.86+0.56) -44.7%
7820.15: perl grep (e.t[^ ]*|v.ry) rare   0.88(5.79+0.38)   
0.26(1.06+0.54) -70.5%
7820.19: perl grep m(ú|u)lt.b(æ|y)te  0.32(1.56+0.46)   
0.23(0.86+0.48) -28.1%

The conditional support for JIT is implemented as suggested in the
pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's
not present.

The implementation is relatively verbose because even if
PCRE_CONFIG_JIT is defined only a call to pcre_config() can determine
if the JIT is available, and if so the faster pcre_jit_exec() function
should be called instead of pcre_exec(), and a different (but not
complimentary!) function needs to be called to free pcre1_extra_info.

There's no graceful fallback if pcre_jit_stack_alloc() fails under
PCRE_CONFIG_JIT, instead the program will simply abort. I don't think
this is worth handling gracefully, it'll only fail in cases where
malloc() doesn't work, in which case we're screwed anyway.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 37 -
 grep.h |  6 ++
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 1157529115..accf1c45e6 100644
--- a/grep.c
+++ b/grep.c
@@ -351,6 +351,9 @@ static void compile_pcre1_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
const char *error;
int erroffset;
int options = PCRE_MULTILINE;
+#ifdef PCRE_CONFIG_JIT
+   int canjit;
+#endif
 
if (opt->ignore_case) {
if (has_non_ascii(p->pattern))
@@ -365,9 +368,20 @@ static void compile_pcre1_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
if (!p->pcre1_regexp)
compile_regexp_failed(p, error);
 
-   p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, );
+   p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 
PCRE_STUDY_JIT_COMPILE, );
if (!p->pcre1_extra_info && error)
die("%s", error);
+
+#ifdef PCRE_CONFIG_JIT
+   pcre_config(PCRE_CONFIG_JIT, );
+   if (canjit == 1) {
+   p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
+   if (!p->pcre1_jit_stack)
+   die("BUG: Couldn't allocate PCRE JIT stack");
+   pcre_assign_jit_stack(p->pcre1_extra_info, NULL, 
p->pcre1_jit_stack);
+   p->pcre1_jit_on = 1;
+   }
+#endif
 }
 
 static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
@@ -378,8 +392,20 @@ static int pcre1match(struct grep_pat *p, const char 
*line, const char *eol,
if (eflags & REG_NOTBOL)
flags |= PCRE_NOTBOL;
 
+#ifdef PCRE_CONFIG_JIT
+   if (p->pcre1_jit_on)
+   ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
+   eol - line, 0, flags, ovector,
+   ARRAY_SIZE(ovector), p->pcre1_jit_stack);
+   else
+   ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
+   eol - line, 0, flags, ovector,
+   ARRAY_SIZE(ovector));
+#else
ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
0, flags, ovector, ARRAY_SIZE(ovector));
+#endif
+
if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
die("pcre_exec failed with error code %d", ret);
if (ret > 0) {
@@ -394,7 +420,16 @@ static int pcre1match(struct grep_pat *p, const char 
*line, const char *eol,
 static void free_pcre1_regexp(struct grep_pat *p)
 {

[PATCH v2 3/7] log: add -P as a synonym for --perl-regexp

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add a short -P option as a synonym for the longer --perl-regexp, for
consistency with the options the corresponding grep invocations
accept.

This was intentionally omitted in commit 727b6fc3ed ("log --grep:
accept --basic-regexp and --perl-regexp", 2012-10-03) for unspecified
future use.

Make it consistent with "grep" rather than to keep it open for future
use, and to avoid the confusion of -P meaning different things for
grep & log, as is the case with the -G option.

As noted in the aforementioned commit the --basic-regexp option can't
have a corresponding -G argument, as the log command already uses that
for -G.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/rev-list-options.txt |  1 +
 revision.c |  2 +-
 t/t4202-log.sh | 12 
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index a46f70c2b1..9c44eae55d 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -91,6 +91,7 @@ endif::git-rev-list[]
Consider the limiting patterns to be fixed strings (don't interpret
pattern as a regular expression).
 
+-P::
 --perl-regexp::
Consider the limiting patterns to be Perl-compatible regular
expressions.
diff --git a/revision.c b/revision.c
index 7ff61ff5f7..03a3a012de 100644
--- a/revision.c
+++ b/revision.c
@@ -1995,7 +1995,7 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
DIFF_OPT_SET(>diffopt, PICKAXE_IGNORE_CASE);
} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED;
-   } else if (!strcmp(arg, "--perl-regexp")) {
+   } else if (!strcmp(arg, "--perl-regexp") || !strcmp(arg, "-P")) {
revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE;
} else if (!strcmp(arg, "--all-match")) {
revs->grep_filter.all_match = 1;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 9680dfe400..5c05c650cd 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -350,8 +350,20 @@ test_expect_success 'log with various grep.patternType 
configurations & command-
--grep="(1|2)" >actual.fixed.short-arg &&
git log --pretty=tformat:%s -E \
--grep="\|2" >actual.extended.short-arg &&
+   if test_have_prereq PCRE
+   then
+   git log --pretty=tformat:%s -P \
+   --grep="[\d]\|" >actual.perl.short-arg
+   else
+   test_must_fail git log -P \
+   --grep="[\d]\|"
+   fi &&
test_cmp expect.fixed actual.fixed.short-arg &&
test_cmp expect.extended actual.extended.short-arg &&
+   if test_have_prereq PCRE
+   then
+   test_cmp expect.perl actual.perl.short-arg
+   fi &&
 
git log --pretty=tformat:%s --fixed-strings \
--grep="(1|2)" >actual.fixed.long-arg &&
-- 
2.11.0

[PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Change the pattern compilation logic under threading so that grep
doesn't compile a pattern it never ends up using on the non-threaded
code path, only to compile it again N times for N threads which will
each use their own copy, ignoring the initially compiled pattern.

This redundant compilation dates back to the initial introduction of
the threaded grep in commit 5b594f457a ("Threaded grep",
2010-01-25).

There was never any reason for doing this redundant work other than an
oversight in the initial commit. Jeff King suggested on-list in
<20170414212325.fefrl3qdjigwy...@sigill.intra.peff.net> that this
might be needed to check the pattern for sanity before threaded
execution commences.

That's not the case. The pattern is compiled under threading in
start_threads() before any concurrent execution has started by calling
pthread_create(), so if the pattern contains an error we still do the
right thing. I.e. die with one error before any threaded execution has
commenced, instead of e.g. spewing out an error for each N threads,
which could be a regression a change like this might inadvertently
introduce.

This change is not meant as an optimization, any performance gains
from this are in the hundreds to thousands of nanoseconds at most. If
we wanted more performance here we could just re-use the compiled
patterns in multiple threads (regcomp(3) is thread-safe), or partially
re-use them and the associated structures in the case of later PCRE
JIT changes.

Rather, it's just to make the code easier to reason about. It's
confusing to debug this under threading & non-threading when the
threading codepaths redundantly compile a pattern which is never used.

The reason the patterns are recompiled is as a side-effect of
duplicating the whole grep_opt structure, which is not thread safe,
writable, and munged during execution. The grep_opt structure then
points to the grep_pat structure where pattern or patterns are stored.

I looked into e.g. splitting the API into some "do & alloc threadsafe
stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the
execution time of grep_opt_dup() & pattern compilation is trivial
compared to actually executing the grep, so there was no point. Even
with the more expensive JIT changes to follow the most expensive PCRE
patterns take something like 0.0X milliseconds to compile at most[1].

The undocumented --debug mode added in commit 17bf35a3c7 ("grep: teach
--debug option to dump the parse tree", 2012-09-13) still works
properly with this change. It only emits debugging info during pattern
compilation, which is now dumped by the pattern compiled just before
the first thread is started.

1. http://sljit.sourceforge.net/pcre.html

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index b1095362fb..12e62fcbf3 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -224,7 +224,8 @@ static void start_threads(struct grep_opt *opt)
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
-   o->debug = 0;
+   if (i)
+   o->debug = 0;
compile_grep_patterns(o);
err = pthread_create([i], NULL, run, o);
 
@@ -1167,8 +1168,6 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
if (!opt.fixed && opt.ignore_case)
opt.regflags |= REG_ICASE;
 
-   compile_grep_patterns();
-
/*
 * We have to find "--" in a separate pass, because its presence
 * influences how we will parse arguments that come before it.
@@ -1245,6 +1244,15 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
num_threads = 0;
 #endif
 
+   if (!num_threads)
+   /*
+* The compiled patterns on the main path are only
+* used when not using threading. Otherwise
+* start_threads() below calls compile_grep_patterns()
+* for each thread.
+*/
+   compile_grep_patterns();
+
 #ifndef NO_PTHREADS
if (num_threads) {
if (!(opt.name_only || opt.unmatch_name_only || opt.count)
-- 
2.11.0

[PATCH v2 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Trivial changes since v1, but I wanted to send a new one for
completeness since I re-sent the "Easy to review grep & pre-PCRE
changes" series.

For v1 see <20170511170142.15934-1-ava...@gmail.com>
(https://public-inbox.org/git/20170511170142.15934-1-ava...@gmail.com/).

Changes noted below & reply to who noted the issue:

Ævar Arnfjörð Bjarmason (7):
  grep: don't redundantly compile throwaway patterns under threading

Brandon: Added a few paragraphs to the commit message about why this
change is being made, i.e. for ease of understanding the code, not
optimization.

  grep: skip pthreads overhead when using one thread
  log: add -P as a synonym for --perl-regexp
  grep: add support for the PCRE v1 JIT API
  grep: un-break building with PCRE < 8.32
  grep: un-break building with PCRE < 8.20
  grep: add support for PCRE v2

No changes in these, except for re-running the performance tests &
changing the commit messages accordingly, due to the change in the
t/perf code in the previous series, as noted there.

 Documentation/rev-list-options.txt |   1 +
 Makefile   |  30 +--
 builtin/grep.c |  16 +++-
 configure.ac   |  77 +---
 grep.c | 180 -
 grep.h |  31 +++
 revision.c |   2 +-
 t/t4202-log.sh |  12 +++
 t/test-lib.sh  |   2 +-
 9 files changed, 327 insertions(+), 24 deletions(-)

-- 
2.11.0

[PATCH v2 27/29] pack-objects: fix buggy warning about threads

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Fix a buggy warning about threads under NO_PTHREADS=YesPlease. Due to
re-using the delta_search_threads variable for both the state of the
"pack.threads" config & the --threads option, setting "pack.threads"
but not supplying --threads would trigger the warning for both
"pack.threads" & --threads.

Solve this bug by resetting the delta_search_threads variable in
git_pack_config(), it might then be set by --threads again and be
subsequently warned about, as the test I'm changing here asserts.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/pack-objects.c | 4 +++-
 t/t5300-pack-object.sh | 3 +--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0fe35d1b5a..f1baf05dfe 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2472,8 +2472,10 @@ static int git_pack_config(const char *k, const char *v, 
void *cb)
die("invalid number of threads specified (%d)",
delta_search_threads);
 #ifdef NO_PTHREADS
-   if (delta_search_threads != 1)
+   if (delta_search_threads != 1) {
warning("no threads support, ignoring %s", k);
+   delta_search_threads = 0;
+   }
 #endif
return 0;
}
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 1629fa80b0..0ec25c4966 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -445,8 +445,7 @@ test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'pack-objects 
--threads=N or pack.
git -c pack.threads=2 pack-objects --stdout --all /dev/null 
2>err &&
cat err &&
grep ^warning: err >warnings &&
-   test_line_count = 2 warnings &&
-   grep -F "no threads support, ignoring --threads" err &&
+   test_line_count = 1 warnings &&
grep -F "no threads support, ignoring pack.threads" err &&
git -c pack.threads=2 pack-objects --threads=4 --stdout --all 
/dev/null 2>err &&
grep ^warning: err >warnings &&
-- 
2.11.0

[PATCH v2 20/29] grep: remove redundant `regflags &= ~REG_EXTENDED` assignments

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Remove redundant assignments to the "regflags" variable. There are no
code paths that have previously set the regflags to anything, and
certainly not to `|= REG_EXTENDED`.

This code gave the impression that it had to reset its environment,
but it doesn't. This dates back to the initial introduction of
git-grep in commit 5010cb5fcc ("built-in "git grep"", 2006-04-30).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index 59ae7809f2..bf6c2494fd 100644
--- a/grep.c
+++ b/grep.c
@@ -179,7 +179,6 @@ static void grep_set_pattern_type_option(enum 
grep_pattern_type pattern_type, st
case GREP_PATTERN_TYPE_BRE:
opt->fixed = 0;
opt->pcre = 0;
-   opt->regflags &= ~REG_EXTENDED;
break;
 
case GREP_PATTERN_TYPE_ERE:
@@ -191,7 +190,6 @@ static void grep_set_pattern_type_option(enum 
grep_pattern_type pattern_type, st
case GREP_PATTERN_TYPE_FIXED:
opt->fixed = 1;
opt->pcre = 0;
-   opt->regflags &= ~REG_EXTENDED;
break;
 
case GREP_PATTERN_TYPE_PCRE:
@@ -414,10 +412,9 @@ static void compile_fixed_regexp(struct grep_pat *p, 
struct grep_opt *opt)
 {
struct strbuf sb = STRBUF_INIT;
int err;
-   int regflags;
+   int regflags = opt->regflags;
 
basic_regex_quote_buf(, p->pattern);
-   regflags = opt->regflags & ~REG_EXTENDED;
if (opt->ignore_case)
regflags |= REG_ICASE;
err = regcomp(>regexp, sb.buf, regflags);
-- 
2.11.0

[PATCH v2 25/29] test-lib: add a PTHREADS prerequisite

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add a PTHREADS prerequisite which is false when git is compiled with
NO_PTHREADS=YesPlease.

There's lots of custom code that runs when threading isn't available,
but before this prerequisite there was no way to test it.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile  | 1 +
 t/README  | 4 
 t/test-lib.sh | 1 +
 3 files changed, 6 insertions(+)

diff --git a/Makefile b/Makefile
index 374fbc7e58..a79274e5e6 100644
--- a/Makefile
+++ b/Makefile
@@ -2242,6 +2242,7 @@ GIT-BUILD-OPTIONS: FORCE
@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
@echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' 
>>$@+
@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
+   @echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' 
>>$@+
@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst 
','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
@echo PAGER_ENV=\''$(subst ','\'',$(subst ','\'',$(PAGER_ENV)))'\' >>$@+
diff --git a/t/README b/t/README
index a90cb62583..2f95860369 100644
--- a/t/README
+++ b/t/README
@@ -817,6 +817,10 @@ use these, and "test_set_prereq" for how to define your 
own.
Test is run on a filesystem which converts decomposed utf-8 (nfd)
to precomposed utf-8 (nfc).
 
+ - PTHREADS
+
+   Git wasn't compiled with NO_PTHREADS=YesPlease.
+
 Tips for Writing Tests
 --
 
diff --git a/t/test-lib.sh b/t/test-lib.sh
index e5cfbcc36b..ab92c0ebaa 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1009,6 +1009,7 @@ esac
 
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
+test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
 test -n "$USE_LIBPCRE1" && test_set_prereq PCRE
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
-- 
2.11.0

[PATCH v2 23/29] grep: change internal pcre variable & function names to be pcre1

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Change the internal PCRE variable & function names to have a "1"
suffix. This is for preparation for libpcre2 support, where having
non-versioned names would be confusing.

An earlier change in this series ("grep: change the internal PCRE
macro names to be PCRE1", 2017-04-07) elaborates on the motivations
behind this change.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 52 ++--
 grep.h |  8 
 2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/grep.c b/grep.c
index 854f2404be..07512346b1 100644
--- a/grep.c
+++ b/grep.c
@@ -178,23 +178,23 @@ static void grep_set_pattern_type_option(enum 
grep_pattern_type pattern_type, st
 
case GREP_PATTERN_TYPE_BRE:
opt->fixed = 0;
-   opt->pcre = 0;
+   opt->pcre1 = 0;
break;
 
case GREP_PATTERN_TYPE_ERE:
opt->fixed = 0;
-   opt->pcre = 0;
+   opt->pcre1 = 0;
opt->regflags |= REG_EXTENDED;
break;
 
case GREP_PATTERN_TYPE_FIXED:
opt->fixed = 1;
-   opt->pcre = 0;
+   opt->pcre1 = 0;
break;
 
case GREP_PATTERN_TYPE_PCRE:
opt->fixed = 0;
-   opt->pcre = 1;
+   opt->pcre1 = 1;
break;
}
 }
@@ -334,7 +334,7 @@ static int has_null(const char *s, size_t len)
 }
 
 #ifdef USE_LIBPCRE1
-static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
+static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt 
*opt)
 {
const char *error;
int erroffset;
@@ -342,23 +342,23 @@ static void compile_pcre_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
 
if (opt->ignore_case) {
if (has_non_ascii(p->pattern))
-   p->pcre_tables = pcre_maketables();
+   p->pcre1_tables = pcre_maketables();
options |= PCRE_CASELESS;
}
if (is_utf8_locale() && has_non_ascii(p->pattern))
options |= PCRE_UTF8;
 
-   p->pcre_regexp = pcre_compile(p->pattern, options, , ,
- p->pcre_tables);
-   if (!p->pcre_regexp)
+   p->pcre1_regexp = pcre_compile(p->pattern, options, , ,
+ p->pcre1_tables);
+   if (!p->pcre1_regexp)
compile_regexp_failed(p, error);
 
-   p->pcre_extra_info = pcre_study(p->pcre_regexp, 0, );
-   if (!p->pcre_extra_info && error)
+   p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, );
+   if (!p->pcre1_extra_info && error)
die("%s", error);
 }
 
-static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
+static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
regmatch_t *match, int eflags)
 {
int ovector[30], ret, flags = 0;
@@ -366,7 +366,7 @@ static int pcrematch(struct grep_pat *p, const char *line, 
const char *eol,
if (eflags & REG_NOTBOL)
flags |= PCRE_NOTBOL;
 
-   ret = pcre_exec(p->pcre_regexp, p->pcre_extra_info, line, eol - line,
+   ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
0, flags, ovector, ARRAY_SIZE(ovector));
if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
die("pcre_exec failed with error code %d", ret);
@@ -379,25 +379,25 @@ static int pcrematch(struct grep_pat *p, const char 
*line, const char *eol,
return ret;
 }
 
-static void free_pcre_regexp(struct grep_pat *p)
+static void free_pcre1_regexp(struct grep_pat *p)
 {
-   pcre_free(p->pcre_regexp);
-   pcre_free(p->pcre_extra_info);
-   pcre_free((void *)p->pcre_tables);
+   pcre_free(p->pcre1_regexp);
+   pcre_free(p->pcre1_extra_info);
+   pcre_free((void *)p->pcre1_tables);
 }
 #else /* !USE_LIBPCRE1 */
-static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
+static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt 
*opt)
 {
die("cannot use Perl-compatible regexes when not compiled with 
USE_LIBPCRE");
 }
 
-static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
+static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
regmatch_t *match, int eflags)
 {
return 1;
 }
 
-static void free_pcre_regexp(struct grep_pat *p)
+static void free_pcre1_regexp(struct grep_pat *p)
 {
 }
 #endif /* !USE_LIBPCRE1 */
@@ -477,8 +477,8 @@ static void compile_regexp(struct grep_pat *p, struct 
grep_opt *opt)
return;
}
 
-   if (opt->pcre) {
-   compile_pcre_regexp(p, opt);
+   if (opt->pcre1) {
+   compile_pcre1_regexp(p, opt);
return;
}
 
@@ -834,8 +834,8 @@ void

[PATCH v2 26/29] pack-objects & index-pack: add test for --threads warning

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add a test for the warning that's emitted when --threads or
pack.threads is provided under NO_PTHREADS=YesPlease. This uses the
new PTHREADS prerequisite.

The assertion for C_LOCALE_OUTPUT in the latter test is currently
redundant, since unlike index-pack the pack-objects warnings aren't
i18n'd. However they might be changed to be i18n'd in the future, and
there's no harm in future-proofing the test.

There's an existing bug in the implementation of pack-objects which
this test currently tests for as-is. Details about the bug & the fix
are included in a follow-up change.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t5300-pack-object.sh | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 43a672c345..1629fa80b0 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -421,6 +421,40 @@ test_expect_success 'index-pack  works in non-repo' '
test_path_is_file foo.idx
 '
 
+test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'index-pack --threads=N or 
pack.threads=N warns when no pthreads' '
+   test_must_fail git index-pack --threads=2 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 1 warnings &&
+   grep -F "no threads support, ignoring --threads=2" err &&
+   test_must_fail git -c pack.threads=2 index-pack 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 1 warnings &&
+   grep -F "no threads support, ignoring pack.threads" err &&
+   test_must_fail git -c pack.threads=2 index-pack --threads=4 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 2 warnings &&
+   grep -F "no threads support, ignoring --threads=4" err &&
+   grep -F "no threads support, ignoring pack.threads" err
+'
+
+test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'pack-objects --threads=N or 
pack.threads=N warns when no pthreads' '
+   git pack-objects --threads=2 --stdout --all /dev/null 2>err 
&&
+   grep ^warning: err >warnings &&
+   test_line_count = 1 warnings &&
+   grep -F "no threads support, ignoring --threads" err &&
+   git -c pack.threads=2 pack-objects --stdout --all /dev/null 
2>err &&
+   cat err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 2 warnings &&
+   grep -F "no threads support, ignoring --threads" err &&
+   grep -F "no threads support, ignoring pack.threads" err &&
+   git -c pack.threads=2 pack-objects --threads=4 --stdout --all 
/dev/null 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 2 warnings &&
+   grep -F "no threads support, ignoring --threads" err &&
+   grep -F "no threads support, ignoring pack.threads" err
+'
+
 #
 # WARNING!
 #
-- 
2.11.0

[PATCH v2 24/29] grep: move is_fixed() earlier to avoid forward declaration

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Move the is_fixed() function which are currently only used in
compile_regexp() earlier so it can be used in the PCRE family of
functions in a later change.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/grep.c b/grep.c
index 07512346b1..1157529115 100644
--- a/grep.c
+++ b/grep.c
@@ -321,6 +321,18 @@ static NORETURN void compile_regexp_failed(const struct 
grep_pat *p,
die("%s'%s': %s", where, p->pattern, error);
 }
 
+static int is_fixed(const char *s, size_t len)
+{
+   size_t i;
+
+   for (i = 0; i < len; i++) {
+   if (is_regex_special(s[i]))
+   return 0;
+   }
+
+   return 1;
+}
+
 static int has_null(const char *s, size_t len)
 {
/*
@@ -402,18 +414,6 @@ static void free_pcre1_regexp(struct grep_pat *p)
 }
 #endif /* !USE_LIBPCRE1 */
 
-static int is_fixed(const char *s, size_t len)
-{
-   size_t i;
-
-   for (i = 0; i < len; i++) {
-   if (is_regex_special(s[i]))
-   return 0;
-   }
-
-   return 1;
-}
-
 static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
struct strbuf sb = STRBUF_INIT;
-- 
2.11.0

[PATCH v2 29/29] grep: assert that threading is enabled when calling grep_{lock,unlock}

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Change the grep_{lock,unlock} functions to assert that num_threads is
true, instead of only locking & unlocking the pthread mutex lock when
it is.

These functions are never called when num_threads isn't true, this
logic has gone through multiple iterations since the initial
introduction of grep threading in commit 5b594f457a ("Threaded grep",
2010-01-25), but ever since then they'd only be called if num_threads
was true, so this check made the code confusing to read.

Replace the check with an assertion, so that it's clear to the reader
that this code path is never taken unless we're spawning threads.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 3c721b75a5..b1095362fb 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -73,14 +73,14 @@ static pthread_mutex_t grep_mutex;
 
 static inline void grep_lock(void)
 {
-   if (num_threads)
-   pthread_mutex_lock(_mutex);
+   assert(num_threads);
+   pthread_mutex_lock(_mutex);
 }
 
 static inline void grep_unlock(void)
 {
-   if (num_threads)
-   pthread_mutex_unlock(_mutex);
+   assert(num_threads);
+   pthread_mutex_unlock(_mutex);
 }
 
 /* Signalled when a new work_item is added to todo. */
-- 
2.11.0

[PATCH v2 28/29] grep: given --threads with NO_PTHREADS=YesPlease, warn

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add a warning about missing thread support when grep.threads or
--threads is set to a non 0 (default) or 1 (no parallelism) value
under NO_PTHREADS=YesPlease.

This is for consistency with the index-pack & pack-objects commands,
which also take a --threads option & are configurable via
pack.threads, and have long warned about the same under
NO_PTHREADS=YesPlease.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c  | 13 +
 t/t7810-grep.sh | 18 ++
 2 files changed, 31 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index a191e2976b..3c721b75a5 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -289,6 +289,17 @@ static int grep_cmd_config(const char *var, const char 
*value, void *cb)
if (num_threads < 0)
die(_("invalid number of threads specified (%d) for 
%s"),
num_threads, var);
+#ifdef NO_PTHREADS
+   else if (num_threads && num_threads != 1) {
+   /*
+* TRANSLATORS: %s is the configuration
+* variable for tweaking threads, currently
+* grep.threads
+*/
+   warning(_("no threads support, ignoring %s"), var);
+   num_threads = 0;
+   }
+#endif
}
 
return st;
@@ -1229,6 +1240,8 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
else if (num_threads < 0)
die(_("invalid number of threads specified (%d)"), num_threads);
 #else
+   if (num_threads)
+   warning(_("no threads support, ignoring --threads"));
num_threads = 0;
 #endif
 
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 561709ef6a..f106387820 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -791,6 +791,24 @@ do
"
 done
 
+test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'grep --threads=N or 
pack.threads=N warns when no pthreads' '
+   git grep --threads=2 Hello hello_world 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 1 warnings &&
+   grep -F "no threads support, ignoring --threads" err &&
+   git -c grep.threads=2 grep Hello hello_world 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 1 warnings &&
+   grep -F "no threads support, ignoring grep.threads" err &&
+   git -c grep.threads=2 grep --threads=4 Hello hello_world 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 2 warnings &&
+   grep -F "no threads support, ignoring --threads" err &&
+   grep -F "no threads support, ignoring grep.threads" err &&
+   git -c grep.threads=0 grep --threads=0 Hello hello_world 2>err &&
+   test_line_count = 0 err
+'
+
 test_expect_success 'grep from a subdirectory to search wider area (1)' '
mkdir -p s &&
(
-- 
2.11.0

[PATCH v2 22/29] grep: change the internal PCRE macro names to be PCRE1

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Change the internal USE_LIBPCRE define, & build options flag to use a
naming convention ending in PCRE1, without changing the long-standing
USE_LIBPCRE Makefile flag which enables this code.

This is for preparation for libpcre2 support where having things like
USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely
need to for backwards compatibility with old Makefile arguments would
be confusing.

In some ways it would be better to change everything that now uses
USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying
USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time
burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their
build scripts.

However I'd like to leave the door open to making
USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease,
i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it
the default.

This code and the USE_LIBPCRE Makefile argument was added in commit
63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was
no indication that the PCRE project would release an entirely new &
incompatible API around 3 years later.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile  | 4 ++--
 grep.c| 6 +++---
 grep.h| 2 +-
 t/test-lib.sh | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index d1587452f1..374fbc7e58 100644
--- a/Makefile
+++ b/Makefile
@@ -1088,7 +1088,7 @@ ifdef NO_LIBGEN_H
 endif
 
 ifdef USE_LIBPCRE
-   BASIC_CFLAGS += -DUSE_LIBPCRE
+   BASIC_CFLAGS += -DUSE_LIBPCRE1
ifdef LIBPCREDIR
BASIC_CFLAGS += -I$(LIBPCREDIR)/include
EXTLIBS += -L$(LIBPCREDIR)/$(lib) 
$(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)
@@ -2240,7 +2240,7 @@ GIT-BUILD-OPTIONS: FORCE
@echo TAR=\''$(subst ','\'',$(subst ','\'',$(TAR)))'\' >>$@+
@echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
-   @echo USE_LIBPCRE=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' 
>>$@+
+   @echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' 
>>$@+
@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst 
','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
diff --git a/grep.c b/grep.c
index 79eb681c6e..854f2404be 100644
--- a/grep.c
+++ b/grep.c
@@ -333,7 +333,7 @@ static int has_null(const char *s, size_t len)
return 0;
 }
 
-#ifdef USE_LIBPCRE
+#ifdef USE_LIBPCRE1
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
const char *error;
@@ -385,7 +385,7 @@ static void free_pcre_regexp(struct grep_pat *p)
pcre_free(p->pcre_extra_info);
pcre_free((void *)p->pcre_tables);
 }
-#else /* !USE_LIBPCRE */
+#else /* !USE_LIBPCRE1 */
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
die("cannot use Perl-compatible regexes when not compiled with 
USE_LIBPCRE");
@@ -400,7 +400,7 @@ static int pcrematch(struct grep_pat *p, const char *line, 
const char *eol,
 static void free_pcre_regexp(struct grep_pat *p)
 {
 }
-#endif /* !USE_LIBPCRE */
+#endif /* !USE_LIBPCRE1 */
 
 static int is_fixed(const char *s, size_t len)
 {
diff --git a/grep.h b/grep.h
index 267534ca24..073b0e4c92 100644
--- a/grep.h
+++ b/grep.h
@@ -1,7 +1,7 @@
 #ifndef GREP_H
 #define GREP_H
 #include "color.h"
-#ifdef USE_LIBPCRE
+#ifdef USE_LIBPCRE1
 #include 
 #else
 typedef int pcre;
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 6abf1d1918..e5cfbcc36b 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1010,7 +1010,7 @@ esac
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
-test -n "$USE_LIBPCRE" && test_set_prereq PCRE
+test -n "$USE_LIBPCRE1" && test_set_prereq PCRE
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
 
 # Can we rely on git's output in the C locale?
-- 
2.11.0

[PATCH v2 21/29] grep: factor test for \0 in grep patterns into a function

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Factor the test for \0 in grep patterns into a function. Since commit
9eceddeec6 ("Use kwset in grep", 2011-08-21) any pattern containing a
\0 is considered fixed as regcomp() can't handle it.

This limitation was never documented, and other some regular
expression engines are capable of compiling a pattern containing a
\0. Factoring this out makes a subsequent change which does that
smaller.

See a previous commit in this series ("grep: add tests to fix blind
spots with \0 patterns", 2017-04-21) for further details & rationale.

While I'm at it make the comment conform to the style guide, i.e. add
an opening "/*\n".

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/grep.c b/grep.c
index bf6c2494fd..79eb681c6e 100644
--- a/grep.c
+++ b/grep.c
@@ -321,6 +321,18 @@ static NORETURN void compile_regexp_failed(const struct 
grep_pat *p,
die("%s'%s': %s", where, p->pattern, error);
 }
 
+static int has_null(const char *s, size_t len)
+{
+   /*
+* regcomp cannot accept patterns with NULs so when using it
+* we consider any pattern containing a NUL fixed.
+*/
+   if (memchr(s, 0, len))
+   return 1;
+
+   return 0;
+}
+
 #ifdef USE_LIBPCRE
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
@@ -394,12 +406,6 @@ static int is_fixed(const char *s, size_t len)
 {
size_t i;
 
-   /* regcomp cannot accept patterns with NULs so we
-* consider any pattern containing a NUL fixed.
-*/
-   if (memchr(s, 0, len))
-   return 1;
-
for (i = 0; i < len; i++) {
if (is_regex_special(s[i]))
return 0;
@@ -451,7 +457,7 @@ static void compile_regexp(struct grep_pat *p, struct 
grep_opt *opt)
 * simple string match using kws.  p->fixed tells us if we
 * want to use kws.
 */
-   if (opt->fixed || is_fixed(p->pattern, p->patternlen))
+   if (opt->fixed || has_null(p->pattern, p->patternlen) || 
is_fixed(p->pattern, p->patternlen))
p->fixed = !icase || ascii_only;
else
p->fixed = 0;
-- 
2.11.0

[PATCH v2 17/29] perf: add a performance comparison of fixed-string grep

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add a performance comparison test which compares both case-sensitive &
case-insensitive fixed-string grep, as well as non-ASCII
case-sensitive & case-insensitive grep.

Currently only the "-i æ" performance test doesn't go through the same
kwset.[ch] codepath, see the "Even when -F..." comment in grep.c.

$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run 
p7821-grep-engines-fixed.sh
--
7821.1: fixed grep int 1.75(1.39+0.34)
7821.2: basic grep int 1.68(1.38+0.28)
7821.3: extended grep int  1.75(1.41+0.29)
7821.4: perl grep int  1.73(1.40+0.30)
7821.6: fixed grep -i int  1.94(1.54+0.35)
7821.7: basic grep -i int  1.92(1.57+0.32)
7821.8: extended grep -i int   1.90(1.54+0.30)
7821.9: perl grep -i int   1.91(1.53+0.36)
7821.11: fixed grep æ  1.35(1.14+0.18)
7821.12: basic grep æ  1.34(1.16+0.16)
7821.13: extended grep æ   1.33(1.15+0.17)
7821.14: perl grep æ   1.35(1.12+0.20)
7821.16: fixed grep -i æ   0.72(0.49+0.22)
7821.17: basic grep -i æ   0.74(0.49+0.21)
7821.18: extended grep -i æ0.72(0.48+0.22)
7821.19: perl grep -i æ0.71(0.44+0.23)

I'm planning to make that not be the case, this performance test gives
a baseline for comparing performance before & after any such change.

See commit ("perf: add a performance comparison test of grep -G, -E
and -P", 2017-04-19) for details on the machine the above test run was
executed on.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/perf/p7821-grep-engines-fixed.sh | 26 ++
 1 file changed, 26 insertions(+)
 create mode 100755 t/perf/p7821-grep-engines-fixed.sh

diff --git a/t/perf/p7821-grep-engines-fixed.sh 
b/t/perf/p7821-grep-engines-fixed.sh
new file mode 100755
index 00..d771cccfdf
--- /dev/null
+++ b/t/perf/p7821-grep-engines-fixed.sh
@@ -0,0 +1,26 @@
+#!/bin/sh
+
+test_description="Comparison of fixed string grep under git-grep's regex 
engines"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+for args in 'int' '-i int' 'æ' '-i æ'
+do
+   for engine in fixed basic extended perl
+   do
+   test_perf "$engine grep $args" "
+   git -c grep.patternType=$engine grep $args 
>'out.$engine.$args' || :
+   "
+   done
+
+   test_expect_success "assert that all engines found the same for $args" "
+   test_cmp 'out.fixed.$args' 'out.basic.$args' &&
+   test_cmp 'out.fixed.$args' 'out.extended.$args' &&
+   test_cmp 'out.fixed.$args' 'out.perl.$args'
+   "
+done
+
+test_done
-- 
2.11.0

[PATCH v2 19/29] grep: remove redundant regflags assignment under PCRE

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Remove a redundant assignment to the "regflags" variable. This
variable is only used for POSIX regular expression matching, not when
the PCRE library is used.

This redundant assignment was added as a result of copy/paste
programming in commit 84befcd0a4 ("grep: add a grep.patternType
configuration setting", 2012-08-03). That commit modified already
working code in commit cca2c172e0 ("git-grep: do not die upon -F/-P
when grep.extendedRegexp is set.", 2011-05-09) which didn't assign to
regflags when under PCRE.

Revert back to that behavior, more to reduce "wait this is used under
PCRE how?" confusion when reading the code, than to to save ourselves
trivial CPU cycles by removing one assignment.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/grep.c b/grep.c
index 47cee45067..59ae7809f2 100644
--- a/grep.c
+++ b/grep.c
@@ -197,7 +197,6 @@ static void grep_set_pattern_type_option(enum 
grep_pattern_type pattern_type, st
case GREP_PATTERN_TYPE_PCRE:
opt->fixed = 0;
opt->pcre = 1;
-   opt->regflags &= ~REG_EXTENDED;
break;
}
 }
-- 
2.11.0

[PATCH v2 13/29] grep: add tests to fix blind spots with \0 patterns

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Address a big blind spot in the tests for patterns containing \0. The
is_fixed() function considers any string that contains \0 fixed, even
if it contains regular expression metacharacters, those patterns are
currently matched with kwset.

Before this change removing that memchr(s, 0, len) check from
is_fixed() wouldn't change the result of any of the tests, since
regcomp() will happily match the part before the \0.

Furthermore, the kwset path is dependent on whether the the -i flag is
on, and whether the pattern has any non-ASCII characters, but none of
this was tested for.

See the a previous commit in this series ("grep: add tests to fix
blind spots with \0 patterns", 2017-04-21) for further details &
rationale.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7008-grep-binary.sh | 71 ++
 1 file changed, 71 insertions(+)

diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index e7754c3946..079395ba54 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -22,6 +22,18 @@ nul_match () {
printf '$pattern' | q_to_nul >f &&
test_must_fail git grep -f f $flags a
"
+   elif test "$status" = T1
+   then
+   test_expect_failure "git grep -f f $flags '$pattern_human' a" "
+   printf '$pattern' | q_to_nul >f &&
+   git grep -f f $flags a
+   "
+   elif test "$status" = T0
+   then
+   test_expect_failure "git grep -f f $flags '$pattern_human' a" "
+   printf '$pattern' | q_to_nul >f &&
+   test_must_fail git grep -f f $flags a
+   "
else
test_expect_success "PANIC: Test framework error. Unknown 
status $status" 'false'
fi
@@ -98,6 +110,65 @@ nul_match 1 '-Fi' 'YQf'
 nul_match 0 '-Fi' 'YQx'
 nul_match 1 '' 'yQf'
 nul_match 0 '' 'yQx'
+nul_match 1 '' 'æQð'
+nul_match 1 '-F' 'eQm[*]c'
+nul_match 1 '-Fi' 'EQM[*]C'
+
+# Regex patterns that would match but shouldn't with -F
+nul_match 0 '-F' 'yQ[f]'
+nul_match 0 '-F' '[y]Qf'
+nul_match 0 '-Fi' 'YQ[F]'
+nul_match 0 '-Fi' '[Y]QF'
+nul_match 0 '-F' 'æQ[ð]'
+nul_match 0 '-F' '[æ]Qð'
+nul_match 0 '-Fi' 'ÆQ[Ð]'
+nul_match 0 '-Fi' '[Æ]QÐ'
+
+# kwset is disabled on -i & non-ASCII. No way to match non-ASCII \0
+# patterns case-insensitively.
+nul_match T1 '-i' 'ÆQÐ'
+
+# \0 implicitly disables regexes. This is an undocumented internal
+# limitation.
+nul_match T1 '' 'yQ[f]'
+nul_match T1 '' '[y]Qf'
+nul_match T1 '-i' 'YQ[F]'
+nul_match T1 '-i' '[Y]Qf'
+nul_match T1 '' 'æQ[ð]'
+nul_match T1 '' '[æ]Qð'
+nul_match T1 '-i' 'ÆQ[Ð]'
+
+# ... because of \0 implicitly disabling regexes regexes that
+# should/shouldn't match don't do the right thing.
+nul_match T1 '' 'eQm.*cQ'
+nul_match T1 '-i' 'EQM.*cQ'
+nul_match T0 '' 'eQm[*]c'
+nul_match T0 '-i' 'EQM[*]C'
+
+# Due to the REG_STARTEND extension when kwset() is disabled on -i &
+# non-ASCII the string will be matched in its entirety, but the
+# pattern will be cut off at the first \0.
+nul_match 0 '-i' 'NOMATCHQð'
+nul_match T0 '-i' '[Æ]QNOMATCH'
+nul_match T0 '-i' '[æ]QNOMATCH'
+# Matches, but for the wrong reasons, just stops at [æ]
+nul_match 1 '-i' '[Æ]Qð'
+nul_match 1 '-i' '[æ]Qð'
+
+# Ensure that the matcher doesn't regress to something that stops at
+# \0
+nul_match 0 '-F' 'yQ[f]'
+nul_match 0 '-Fi' 'YQ[F]'
+nul_match 0 '' 'yQNOMATCH'
+nul_match 0 '' 'QNOMATCH'
+nul_match 0 '-i' 'YQNOMATCH'
+nul_match 0 '-i' 'QNOMATCH'
+nul_match 0 '-F' 'æQ[ð]'
+nul_match 0 '-Fi' 'ÆQ[Ð]'
+nul_match 0 '' 'yQNÓMATCH'
+nul_match 0 '' 'QNÓMATCH'
+nul_match 0 '-i' 'YQNÓMATCH'
+nul_match 0 '-i' 'QNÓMATCH'
 
 test_expect_success 'grep respects binary diff attribute' '
echo text >t &&
-- 
2.11.0

[PATCH v2 14/29] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add a git GIT_PERF_MAKE_COMMAND variable to compliment the existing
GIT_PERF_MAKE_OPTS facility. This allows specifying an arbitrary shell
command to execute instead of 'make'.

This is useful e.g. in cases where the name, semantics or defaults of
a Makefile flag have changed over time. It can even be used to change
the contents of the tree, useful for monkeypatching ancient versions
of git to get them to build.

This opens Pandora's box in some ways, it's now possible to
"jailbreak" the perf environment and e.g. modify the source tree via
this arbitrary instead of just issuing a custom "make" command, such a
command has to be re-entrant in the sense that subsequent perf runs
will re-use the possibly modified tree.

It would be pointless to try to mitigate or work around that caveat in
a tool purely aimed at Git developers, so this change makes no attempt
to do so.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile  |  3 +++
 t/perf/README | 19 +--
 t/perf/run| 11 +--
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index eedadb8056..d1587452f1 100644
--- a/Makefile
+++ b/Makefile
@@ -2272,6 +2272,9 @@ endif
 ifdef GIT_PERF_MAKE_OPTS
@echo GIT_PERF_MAKE_OPTS=\''$(subst ','\'',$(subst 
','\'',$(GIT_PERF_MAKE_OPTS)))'\' >>$@+
 endif
+ifdef GIT_PERF_MAKE_COMMAND
+   @echo GIT_PERF_MAKE_COMMAND=\''$(subst ','\'',$(subst 
','\'',$(GIT_PERF_MAKE_COMMAND)))'\' >>$@+
+endif
 ifdef GIT_INTEROP_MAKE_OPTS
@echo GIT_INTEROP_MAKE_OPTS=\''$(subst ','\'',$(subst 
','\'',$(GIT_INTEROP_MAKE_OPTS)))'\' >>$@+
 endif
diff --git a/t/perf/README b/t/perf/README
index 49ea4349be..b3d95042a8 100644
--- a/t/perf/README
+++ b/t/perf/README
@@ -60,8 +60,23 @@ You can set the following variables (also in your 
config.mak):
 
 GIT_PERF_MAKE_OPTS
Options to use when automatically building a git tree for
-   performance testing.  E.g., -j6 would be useful.
-
+   performance testing. E.g., -j6 would be useful. Passed
+   directly to make as "make $GIT_PERF_MAKE_OPTS".
+
+GIT_PERF_MAKE_COMMAND
+   An arbitrary command that'll be run in place of the make
+   command, if set the GIT_PERF_MAKE_OPTS variable is
+   ignored. Useful in cases where source tree changes might
+   require issuing a different make command to different
+   revisions.
+
+   This can be (ab)used to monkeypatch or otherwise change the
+   tree about to be built. Note that the build directory can be
+   re-used for subsequent runs so the make command might get
+   executed multiple times on the same tree, but don't count on
+   any of that, that's an implementation detail that might change
+   in the future.
+ 
 GIT_PERF_REPO
 GIT_PERF_LARGE_REPO
Repositories to copy for the performance tests.  The normal
diff --git a/t/perf/run b/t/perf/run
index c788d713ae..b61024a830 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -37,8 +37,15 @@ build_git_rev () {
cp "../../$config" "build/$rev/"
fi
done
-   (cd build/$rev && make $GIT_PERF_MAKE_OPTS) ||
-   die "failed to build revision '$mydir'"
+   (
+   cd build/$rev &&
+   if test -n "$GIT_PERF_MAKE_COMMAND"
+   then
+   sh -c "$GIT_PERF_MAKE_COMMAND"
+   else
+   make $GIT_PERF_MAKE_OPTS
+   fi
+   ) || die "failed to build revision '$mydir'"
 }
 
 run_dirs_helper () {
-- 
2.11.0

[PATCH v2 16/29] perf: add a performance comparison test of grep -G, -E and -P

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add a very basic performance comparison test comparing the POSIX
basic, extended and perl engines.

In theory the "basic" and "extended" engines should be implemented
using the same underlying code with a slightly different pattern
parser, but some implementations may not do this. Jump through some
slight hoops to test both, which is worthwhile since "basic" is the
default.

Running this on an i7 3.4GHz Linux 4.9.0-2 Debian testing against a
checkout of linux.git & latest upstream PCRE, both PCRE and git
compiled with -O3 using gcc 7.1.1:

$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run 
p7820-grep-engines.sh
[...]
-
7820.1: basic grep how.to 0.28(1.27+0.41)
7820.2: extended grep how.to  0.29(1.23+0.44)
7820.3: perl grep how.to  0.27(1.14+0.47)
7820.5: basic grep ^how to0.28(1.18+0.46)
7820.6: extended grep ^how to 0.28(1.25+0.40)
7820.7: perl grep ^how to 0.48(2.76+0.40)
7820.9: basic grep [how] to   0.42(2.20+0.42)
7820.10: extended grep [how] to   0.41(2.12+0.50)
7820.11: perl grep [how] to   0.48(2.59+0.42)
7820.13: basic grep \(e.t[^ ]*\|v.ry\) rare   0.55(3.26+0.44)
7820.14: extended grep (e.t[^ ]*|v.ry) rare   0.54(3.34+0.38)
7820.15: perl grep (e.t[^ ]*|v.ry) rare   0.88(5.82+0.35)
7820.17: basic grep m\(ú\|u\)lt.b\(æ\|y\)te   0.29(1.26+0.48)
7820.18: extended grep m(ú|u)lt.b(æ|y)te  0.29(1.36+0.39)
7820.19: perl grep m(ú|u)lt.b(æ|y)te  0.32(1.61+0.43)

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/perf/p7820-grep-engines.sh | 35 +++
 1 file changed, 35 insertions(+)
 create mode 100755 t/perf/p7820-grep-engines.sh

diff --git a/t/perf/p7820-grep-engines.sh b/t/perf/p7820-grep-engines.sh
new file mode 100755
index 00..eb439900f3
--- /dev/null
+++ b/t/perf/p7820-grep-engines.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+
+test_description="Comparison of git-grep's regex engines"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+for pattern in \
+   'how.to' \
+   '^how to' \
+   '[how] to' \
+   '\(e.t[^ ]*\|v.ry\) rare' \
+   'm\(ú\|u\)lt.b\(æ\|y\)te'
+do
+   for engine in basic extended perl
+   do
+   if test $engine != "basic"
+   then
+   # Poor man's basic -> extended converter.
+   pattern=$(echo $pattern | sed 's/\\//g')
+   fi
+   test_perf "$engine grep $pattern" "
+   git -c grep.patternType=$engine grep -- '$pattern' 
>'out.$engine' || :
+   "
+   done
+
+   test_expect_success "assert that all engines found the same for 
$pattern" "
+   test_cmp 'out.basic' 'out.extended' &&
+   test_cmp 'out.basic' 'out.perl'
+   "
+done
+
+test_done
-- 
2.11.0

[PATCH v2 15/29] perf: emit progress output when unpacking & building

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Amend the t/perf/run output so that in addition to the "Running N
tests" heading currently being emitted, it also emits "Unpacking $rev"
and "Building $rev" when setting up the build/$rev directory & when
building it, respectively.

This makes it easier to see what's going on and what revision is being
tested as the output scrolls by.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/perf/run | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/perf/run b/t/perf/run
index b61024a830..beb4acc0e4 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -24,6 +24,7 @@ run_one_dir () {
 
 unpack_git_rev () {
rev=$1
+   echo "=== Unpacking $rev in build/$rev ==="
mkdir -p build/$rev
(cd "$(git rev-parse --show-cdup)" && git archive --format=tar $rev) |
(cd build/$rev && tar x)
@@ -37,6 +38,7 @@ build_git_rev () {
cp "../../$config" "build/$rev/"
fi
done
+   echo "=== Building $rev ==="
(
cd build/$rev &&
if test -n "$GIT_PERF_MAKE_COMMAND"
-- 
2.11.0

[PATCH v2 18/29] grep: catch a missing enum in switch statement

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add a die(...) to a default case for the switch statement selecting
between grep pattern types under --recurse-submodules.

Normally this would be caught by -Wswitch, but the grep_pattern_type
type is converted to int by going through parse_options(). Changing
the argument type passed to compile_submodule_options() won't work,
the value will just get coerced. The -Wswitch-default warning will
warn about it, but that produces a lot of noise across the codebase,
this potential issue would be drowned in that noise.

Thus catching this at runtime is the least worst option. This won't
ever trigger in practice, but if a new pattern type were to be added
this catches an otherwise silent bug during development.

See commit 0281e487fd ("grep: optionally recurse into submodules",
2016-12-16) for the initial addition of this code.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 3ffb5b4e81..a191e2976b 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -495,6 +495,8 @@ static void compile_submodule_options(const struct grep_opt 
*opt,
break;
case GREP_PATTERN_TYPE_UNSPECIFIED:
break;
+   default:
+   die("BUG: Added a new grep pattern type without updating switch 
statement");
}
 
for (pattern = opt->pattern_list; pattern != NULL;
-- 
2.11.0

[PATCH v2 10/29] grep: add tests for grep pattern types being passed to submodules

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add testing for grep pattern types being correctly passed to
submodules. The pattern "(.|.)[\d]" matches differently under
fixed (not at all), and then matches different lines under
basic/extended & perl regular expressions, so this change asserts that
the pattern type is passed along correctly.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7814-grep-recurse-submodules.sh | 49 ++
 1 file changed, 49 insertions(+)

diff --git a/t/t7814-grep-recurse-submodules.sh 
b/t/t7814-grep-recurse-submodules.sh
index 1472855e1d..3a58197f47 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -313,4 +313,53 @@ test_incompatible_with_recurse_submodules ()
 test_incompatible_with_recurse_submodules --untracked
 test_incompatible_with_recurse_submodules --no-index
 
+test_expect_success 'grep --recurse-submodules should pass the pattern type 
along' '
+   # Fixed
+   test_must_fail git grep -F --recurse-submodules -e "(.|.)[\d]" &&
+   test_must_fail git -c grep.patternType=fixed grep --recurse-submodules 
-e "(.|.)[\d]" &&
+
+   # Basic
+   git grep -G --recurse-submodules -e "(.|.)[\d]" >actual &&
+   cat >expect <<-\EOF &&
+   a:(1|2)d(3|4)
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
+   EOF
+   test_cmp expect actual &&
+   git -c grep.patternType=basic grep --recurse-submodules -e "(.|.)[\d]" 
>actual &&
+   test_cmp expect actual &&
+
+   # Extended
+   git grep -E --recurse-submodules -e "(.|.)[\d]" >actual &&
+   cat >expect <<-\EOF &&
+   .gitmodules:[submodule "submodule"]
+   .gitmodules:path = submodule
+   .gitmodules:url = ./submodule
+   a:(1|2)d(3|4)
+   submodule/.gitmodules:[submodule "sub"]
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
+   EOF
+   test_cmp expect actual &&
+   git -c grep.patternType=extended grep --recurse-submodules -e 
"(.|.)[\d]" >actual &&
+   test_cmp expect actual &&
+   git -c grep.extendedRegexp=true grep --recurse-submodules -e 
"(.|.)[\d]" >actual &&
+   test_cmp expect actual &&
+
+   # Perl
+   if test_have_prereq PCRE
+   then
+   git grep -P --recurse-submodules -e "(.|.)[\d]" >actual &&
+   cat >expect <<-\EOF &&
+   a:(1|2)d(3|4)
+   b/b:(3|4)
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
+   EOF
+   test_cmp expect actual &&
+   git -c grep.patternType=perl grep --recurse-submodules -e 
"(.|.)[\d]" >actual &&
+   test_cmp expect actual
+   fi
+'
+
 test_done
-- 
2.11.0

[PATCH v2 12/29] grep: prepare for testing binary regexes containing rx metacharacters

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add setup code needed for testing regexes that contain both binary
data and regex metacharacters.

The POSIX regcomp() function inherently can't support that, because it
takes a \0-delimited char *, but other regex engines APIs like PCRE v2
take a pattern/length pair, and are thus able to handle \0s in
patterns as well as any other character.

When kwset was imported in commit 9eceddeec6 ("Use kwset in grep",
2011-08-21) this limitation was fixed, but at the expense of
introducing the undocumented limitation that any pattern containing \0
implicitly becomes a fixed match (equivalent to -F having been
provided).

That's not something we'd like to keep in the future. The inability to
match patterns containing \0 is a leaky implementation detail.

So add tests as a first step towards changing that. In order to test
that \0-patterns can properly match as regexes the test string needs
to have some regex metacharacters in it.

There were other blind spots in the tests. The code around kwset
specially handles case-insensitive & non-ASCII data, but there were no
tests for this.

Fix all of that by amending the text being matched to contain both
regex metacharacters & non-ASCII data.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7008-grep-binary.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index 70e7868829..e7754c3946 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -28,7 +28,7 @@ nul_match () {
 }
 
 test_expect_success 'setup' "
-   echo 'binaryQfile' | q_to_nul >a &&
+   echo 'binaryQfileQm[*]cQ*æQð' | q_to_nul >a &&
git add a &&
git commit -m.
 "
@@ -162,7 +162,7 @@ test_expect_success 'grep does not honor textconv' '
 '
 
 test_expect_success 'grep --textconv honors textconv' '
-   echo "a:binaryQfile" >expect &&
+   echo "a:binaryQfileQm[*]cQ*æQð" >expect &&
git grep --textconv Qfile >actual &&
test_cmp expect actual
 '
@@ -172,7 +172,7 @@ test_expect_success 'grep --no-textconv does not honor 
textconv' '
 '
 
 test_expect_success 'grep --textconv blob honors textconv' '
-   echo "HEAD:a:binaryQfile" >expect &&
+   echo "HEAD:a:binaryQfileQm[*]cQ*æQð" >expect &&
git grep --textconv Qfile HEAD:a >actual &&
test_cmp expect actual
 '
-- 
2.11.0

[PATCH v2 11/29] grep: add a test helper function for less verbose -f \0 tests

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add a helper function to make the tests which check for patterns with
\0 in them more succinct. Right now this isn't a big win, but
subsequent commits will add a lot more of these tests.

The helper is based on the match() function in t3070-wildmatch.sh.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7008-grep-binary.sh | 58 +-
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index 9c9c378119..70e7868829 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -4,6 +4,29 @@ test_description='git grep in binary files'
 
 . ./test-lib.sh
 
+nul_match () {
+   status=$1
+   flags=$2
+   pattern=$3
+   pattern_human=$(echo "$pattern" | sed 's/Q//g')
+
+   if test "$status" = 1
+   then
+   test_expect_success "git grep -f f $flags '$pattern_human' a" "
+   printf '$pattern' | q_to_nul >f &&
+   git grep -f f $flags a
+   "
+   elif test "$status" = 0
+   then
+   test_expect_success "git grep -f f $flags '$pattern_human' a" "
+   printf '$pattern' | q_to_nul >f &&
+   test_must_fail git grep -f f $flags a
+   "
+   else
+   test_expect_success "PANIC: Test framework error. Unknown 
status $status" 'false'
+   fi
+}
+
 test_expect_success 'setup' "
echo 'binaryQfile' | q_to_nul >a &&
git add a &&
@@ -69,35 +92,12 @@ test_expect_failure 'git grep .fi a' '
git grep .fi a
 '
 
-test_expect_success 'git grep -F yf a' "
-   printf 'yQf' | q_to_nul >f &&
-   git grep -f f -F a
-"
-
-test_expect_success 'git grep -F yx a' "
-   printf 'yQx' | q_to_nul >f &&
-   test_must_fail git grep -f f -F a
-"
-
-test_expect_success 'git grep -Fi Yf a' "
-   printf 'YQf' | q_to_nul >f &&
-   git grep -f f -Fi a
-"
-
-test_expect_success 'git grep -Fi Yx a' "
-   printf 'YQx' | q_to_nul >f &&
-   test_must_fail git grep -f f -Fi a
-"
-
-test_expect_success 'git grep yf a' "
-   printf 'yQf' | q_to_nul >f &&
-   git grep -f f a
-"
-
-test_expect_success 'git grep yx a' "
-   printf 'yQx' | q_to_nul >f &&
-   test_must_fail git grep -f f a
-"
+nul_match 1 '-F' 'yQf'
+nul_match 0 '-F' 'yQx'
+nul_match 1 '-Fi' 'YQf'
+nul_match 0 '-Fi' 'YQx'
+nul_match 1 '' 'yQf'
+nul_match 0 '' 'yQx'
 
 test_expect_success 'grep respects binary diff attribute' '
echo text >t &&
-- 
2.11.0

[PATCH v2 03/29] test-lib: rename the LIBPCRE prerequisite to PCRE

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Rename the LIBPCRE prerequisite to PCRE. This is for preparation for
libpcre2 support, where having just "LIBPCRE" would be confusing as it
implies v1 of the library.

None of these tests are incompatible between versions 1 & 2 of
libpcre, it's less confusing to give them a more general name to make
it clear that they work on both library versions.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/README|  4 ++--
 t/t7810-grep.sh | 28 ++--
 t/t7812-grep-icase-non-ascii.sh |  4 ++--
 t/t7813-grep-icase-iso.sh   |  2 +-
 t/test-lib.sh   |  2 +-
 5 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/t/README b/t/README
index ab386c3681..a90cb62583 100644
--- a/t/README
+++ b/t/README
@@ -803,9 +803,9 @@ use these, and "test_set_prereq" for how to define your own.
Test is not run by root user, and an attempt to write to an
unwritable file is expected to fail correctly.
 
- - LIBPCRE
+ - PCRE
 
-   Git was compiled with USE_LIBPCRE=YesPlease. Wrap any tests
+   Git was compiled with support for PCRE. Wrap any tests
that use git-grep --perl-regexp or git-grep -P in these.
 
  - CASE_INSENSITIVE_FS
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index cee42097b0..c84c4d99f9 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -275,7 +275,7 @@ do
test_cmp expected actual
'
 
-   test_expect_success LIBPCRE "grep $L with grep.patterntype=perl" '
+   test_expect_success PCRE "grep $L with grep.patterntype=perl" '
echo "${HC}ab:a+b*c" >expected &&
git -c grep.patterntype=perl grep "a\x{2b}b\x{2a}c" $H ab 
>actual &&
test_cmp expected actual
@@ -1053,12 +1053,12 @@ hello.c:int main(int argc, const char **argv)
 hello.c:   printf("Hello world.\n");
 EOF
 
-test_expect_success LIBPCRE 'grep --perl-regexp pattern' '
+test_expect_success PCRE 'grep --perl-regexp pattern' '
git grep --perl-regexp "\p{Ps}.*?\p{Pe}" hello.c >actual &&
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P pattern' '
+test_expect_success PCRE 'grep -P pattern' '
git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual &&
test_cmp expected actual
 '
@@ -1070,13 +1070,13 @@ test_expect_success 'grep pattern with 
grep.extendedRegexp=true' '
test_cmp empty actual
 '
 
-test_expect_success LIBPCRE 'grep -P pattern with grep.extendedRegexp=true' '
+test_expect_success PCRE 'grep -P pattern with grep.extendedRegexp=true' '
git -c grep.extendedregexp=true \
grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual &&
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P -v pattern' '
+test_expect_success PCRE 'grep -P -v pattern' '
{
echo "ab:a+b*c"
echo "ab:a+bc"
@@ -1085,7 +1085,7 @@ test_expect_success LIBPCRE 'grep -P -v pattern' '
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P -i pattern' '
+test_expect_success PCRE 'grep -P -i pattern' '
cat >expected <<-EOF &&
hello.c:printf("Hello world.\n");
EOF
@@ -1093,7 +1093,7 @@ test_expect_success LIBPCRE 'grep -P -i pattern' '
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P -w pattern' '
+test_expect_success PCRE 'grep -P -w pattern' '
{
echo "hello_world:Hello world"
echo "hello_world:HeLLo world"
@@ -1118,11 +1118,11 @@ test_expect_success 'grep invalidpattern properly dies 
with grep.patternType=ext
test_must_fail git -c grep.patterntype=extended grep "a["
 '
 
-test_expect_success LIBPCRE 'grep -P invalidpattern properly dies ' '
+test_expect_success PCRE 'grep -P invalidpattern properly dies ' '
test_must_fail git grep -P "a["
 '
 
-test_expect_success LIBPCRE 'grep invalidpattern properly dies with 
grep.patternType=perl' '
+test_expect_success PCRE 'grep invalidpattern properly dies with 
grep.patternType=perl' '
test_must_fail git -c grep.patterntype=perl grep "a["
 '
 
@@ -1191,13 +1191,13 @@ test_expect_success 'grep pattern with 
grep.patternType=fixed, =basic, =perl, =e
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -G -F -E -P pattern' '
+test_expect_success PCRE 'grep -G -F -E -P pattern' '
echo "d0:0" >expected &&
git grep -G -F -E -P "[\d]" d0 >actual &&
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep pattern with grep.patternType=fixed, =basic, 
=extended, =perl' '
+test_expect_success PCRE 'grep pattern with grep.patternType=fixed, =basic, 
=extended, =perl' '
echo "d0:0" >expected &&
git \
-c grep.patterntype=fixed \
@@ -1208,7 +1208,7 @@ test_expect_success LIBPCRE 'grep pattern with 
grep.patternType=fixed, =basic, =
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P pattern with

[PATCH v2 05/29] grep: add a test asserting that --perl-regexp dies when !PCRE

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add a test asserting that when --perl-regexp (and -P for grep) is
given to git-grep & git-log that we die with an error.

In developing the PCRE v2 series I introduced a regression where -P
would (through control-flow fall-through) become synonymous with basic
POSIX matching. I.e. 'git grep -P '[\d]' would match "d" instead of
digits.

The entire test suite would still pass with this serious regression,
since everything that tested for --perl-regexp would be guarded by the
PCRE prerequisite, fix that blind-spot by adding tests under !PCRE
asserting that git must die when given --perl-regexp or -P.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t4202-log.sh  |  4 +++-
 t/t7810-grep.sh | 12 
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index e522a2fcd5..9680dfe400 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -364,7 +364,9 @@ test_expect_success 'log with various grep.patternType 
configurations & command-
git log --pretty=tformat:%s --perl-regexp \
--grep="[\d]\|" >actual.perl.long-arg &&
test_cmp expect.perl actual.perl.long-arg
-
+   else
+   test_must_fail git log --perl-regexp \
+   --grep="[\d]\|"
fi &&
test_cmp expect.fixed actual.fixed.long-arg &&
test_cmp expect.basic actual.basic.long-arg &&
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index c84c4d99f9..8d69113695 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -281,6 +281,10 @@ do
test_cmp expected actual
'
 
+   test_expect_success !PCRE "grep $L with grep.patterntype=perl errors 
without PCRE" '
+   test_must_fail git -c grep.patterntype=perl grep "foo.*bar"
+   '
+
test_expect_success "grep $L with grep.patternType=default and 
grep.extendedRegexp=true" '
echo "${HC}ab:abc" >expected &&
git \
@@ -1058,11 +1062,19 @@ test_expect_success PCRE 'grep --perl-regexp pattern' '
test_cmp expected actual
 '
 
+test_expect_success !PCRE 'grep --perl-regexp pattern errors without PCRE' '
+   test_must_fail git grep --perl-regexp "foo.*bar"
+'
+
 test_expect_success PCRE 'grep -P pattern' '
git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual &&
test_cmp expected actual
 '
 
+test_expect_success !PCRE 'grep -P pattern errors without PCRE' '
+   test_must_fail git grep -P "foo.*bar"
+'
+
 test_expect_success 'grep pattern with grep.extendedRegexp=true' '
>empty &&
test_must_fail git -c grep.extendedregexp=true \
-- 
2.11.0

[PATCH v2 09/29] grep: amend submodule recursion test for regex engine testing

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Amend the submodule recursion test to prepare it for subsequent tests
of whether it passes along the grep.patternType to the submodule
greps.

This is the result of searching & replacing:

foobar -> (1|2)d(3|4)
foo-> (1|2)
bar-> (3|4)

Currently there's no tests for whether e.g. -P or -E is correctly
passed along, tests for that will be added in a follow-up change, but
first add content to the tests which will match differently under
different regex engines.

Reuse the pattern established in an earlier commit of mine in this
series ("log: add exhaustive tests for pattern style options &
config", 2017-04-07). The pattern "(.|.)[\d]" will match this content
differently under fixed/basic/extended & perl.

This test code was originally added in commit 0281e487fd ("grep:
optionally recurse into submodules", 2016-12-16).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7814-grep-recurse-submodules.sh | 166 ++---
 1 file changed, 83 insertions(+), 83 deletions(-)

diff --git a/t/t7814-grep-recurse-submodules.sh 
b/t/t7814-grep-recurse-submodules.sh
index 5b6eb3a65e..1472855e1d 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -9,13 +9,13 @@ submodules.
 . ./test-lib.sh
 
 test_expect_success 'setup directory structure and submodule' '
-   echo "foobar" >a &&
+   echo "(1|2)d(3|4)" >a &&
mkdir b &&
-   echo "bar" >b/b &&
+   echo "(3|4)" >b/b &&
git add a b &&
git commit -m "add a and b" &&
git init submodule &&
-   echo "foobar" >submodule/a &&
+   echo "(1|2)d(3|4)" >submodule/a &&
git -C submodule add a &&
git -C submodule commit -m "add a" &&
git submodule add ./submodule &&
@@ -24,18 +24,18 @@ test_expect_success 'setup directory structure and 
submodule' '
 
 test_expect_success 'grep correctly finds patterns in a submodule' '
cat >expect <<-\EOF &&
-   a:foobar
-   b/b:bar
-   submodule/a:foobar
+   a:(1|2)d(3|4)
+   b/b:(3|4)
+   submodule/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --recurse-submodules >actual &&
+   git grep -e "(3|4)" --recurse-submodules >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'grep and basic pathspecs' '
cat >expect <<-\EOF &&
-   submodule/a:foobar
+   submodule/a:(1|2)d(3|4)
EOF
 
git grep -e. --recurse-submodules -- submodule >actual &&
@@ -44,7 +44,7 @@ test_expect_success 'grep and basic pathspecs' '
 
 test_expect_success 'grep and nested submodules' '
git init submodule/sub &&
-   echo "foobar" >submodule/sub/a &&
+   echo "(1|2)d(3|4)" >submodule/sub/a &&
git -C submodule/sub add a &&
git -C submodule/sub commit -m "add a" &&
git -C submodule submodule add ./sub &&
@@ -54,117 +54,117 @@ test_expect_success 'grep and nested submodules' '
git commit -m "updated submodule" &&
 
cat >expect <<-\EOF &&
-   a:foobar
-   b/b:bar
-   submodule/a:foobar
-   submodule/sub/a:foobar
+   a:(1|2)d(3|4)
+   b/b:(3|4)
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --recurse-submodules >actual &&
+   git grep -e "(3|4)" --recurse-submodules >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'grep and multiple patterns' '
cat >expect <<-\EOF &&
-   a:foobar
-   submodule/a:foobar
-   submodule/sub/a:foobar
+   a:(1|2)d(3|4)
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --and -e "foo" --recurse-submodules >actual &&
+   git grep -e "(3|4)" --and -e "(1|2)" --recurse-submodules >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'grep and multiple patterns' '
cat >expect <<-\EOF &&
-   b/b:bar
+   b/b:(3|4)
EOF
 
-   git grep -e "bar" --and --not -e "foo" --recurse-submodules >actual &&
+   git grep -e "(3|4)" --and --not -e "(1|2)" --recurse-submodules >actual 
&&
test_cmp expect actual
 '
 
 test_expect_success 'basic grep tree' '
cat >expect <<-\EOF &&
-   HEAD:a:foobar
-   HEAD:b/b:bar
-   HEAD:submodule/a:foobar
-   HEAD:submodule/sub/a:foobar
+   HEAD:a:(1|2)d(3|4)
+   HEAD:b/b:(3|4)
+   HEAD:submodule/a:(1|2)d(3|4)
+   HEAD:submodule/sub/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --recurse-submodules HEAD >actual &&
+   git grep -e "(3|4)" --recurse-submodules HEAD >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'grep tree HEAD^' '
cat >expect <<-\EOF &&
-   HEAD^:a:foobar
-   HEAD^:b/b:bar
-   HEAD^:submodule/a:foobar
+   HEAD^:a:(1|2)d(3|4)
+   HEAD^:b/b:(3|4)
+   HEAD^:submodule/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --recurse-submodules HEAD^ >actual &&
+   git grep

[PATCH v2 02/29] grep & rev-list doc: stop promising libpcre for --perl-regexp

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Stop promising in our grep & rev-list options documentation that we're
always going to be using libpcre when given the --perl-regexp option.

Instead talk about using "Perl-compatible regular expressions" and
using these types of patterns using "a compile-time dependency".

Saying "libpcre" means that we're talking about libpcre.so, which is
always going to be v1. This change is part of an ongoing saga to add
support for libpcre2, which comes with PCRE v2.

In the future we might use some completely unrelated library to
provide perl-compatible regular expression support. By wording the
documentation differently and not promising any specific version of
PCRE or even PCRE at all we have more wiggle room to change the
implementation.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/git-grep.txt | 7 +--
 Documentation/rev-list-options.txt | 8 ++--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 71f32f3508..5033483db4 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -161,8 +161,11 @@ OPTIONS
 
 -P::
 --perl-regexp::
-   Use Perl-compatible regexp for patterns. Requires libpcre to be
-   compiled in.
+   Use Perl-compatible regular expressions for patterns.
++
+Support for these types of regular expressions is an optional
+compile-time dependency. If Git wasn't compiled with support for them
+providing this option will cause it to die.
 
 -F::
 --fixed-strings::
diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index a02f7324c0..a46f70c2b1 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -92,8 +92,12 @@ endif::git-rev-list[]
pattern as a regular expression).
 
 --perl-regexp::
-   Consider the limiting patterns to be Perl-compatible regular 
expressions.
-   Requires libpcre to be compiled in.
+   Consider the limiting patterns to be Perl-compatible regular
+   expressions.
++
+Support for these types of regular expressions is an optional
+compile-time dependency. If Git wasn't compiled with support for them
+providing this option will cause it to die.
 
 --remove-empty::
Stop when a given path disappears from the tree.
-- 
2.11.0

[PATCH v2 07/29] grep: change non-ASCII -i test to stop using --debug

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Change a non-ASCII case-insensitive test case to stop using --debug,
and instead simply test for the expected results.

The test coverage remains the same with this change, but the test
won't break due to internal refactoring.

This test was added in commit 793dc676e0 ("grep/icase: avoid kwsset
when -F is specified", 2016-06-25). It was asserting that the regex
must be compiled with compile_fixed_regexp(), instead test for the
expected results, allowing the underlying implementation to change.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7812-grep-icase-non-ascii.sh | 25 +
 1 file changed, 5 insertions(+), 20 deletions(-)

diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index 04a61cb8e0..0059a1f837 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -36,29 +36,14 @@ test_expect_success GETTEXT_LOCALE,PCRE 'grep pcre utf-8 
string with "+"' '
 '
 
 test_expect_success REGEX_LOCALE 'grep literal string, with -F' '
-   git grep --debug -i -F "TILRAUN: Halló Heimur!"  2>&1 >/dev/null |
-grep fixed >debug1 &&
-   test_write_lines "fixed TILRAUN: Halló Heimur!" >expect1 &&
-   test_cmp expect1 debug1 &&
-
-   git grep --debug -i -F "TILRAUN: HALLÓ HEIMUR!"  2>&1 >/dev/null |
-grep fixed >debug2 &&
-   test_write_lines "fixed TILRAUN: HALLÓ HEIMUR!" >expect2 &&
-   test_cmp expect2 debug2
+   git grep -i -F "TILRAUN: Halló Heimur!" &&
+   git grep -i -F "TILRAUN: HALLÓ HEIMUR!"
 '
 
 test_expect_success REGEX_LOCALE 'grep string with regex, with -F' '
-   test_write_lines "^*TILR^AUN:.* \\Halló \$He[]imur!\$" >file &&
-
-   git grep --debug -i -F "^*TILR^AUN:.* \\Halló \$He[]imur!\$" 2>&1 
>/dev/null |
-grep fixed >debug1 &&
-   test_write_lines "fixed \\^*TILR^AUN:\\.\\* Halló 
\$He\\[]imur!\\\$" >expect1 &&
-   test_cmp expect1 debug1 &&
-
-   git grep --debug -i -F "^*TILR^AUN:.* \\HALLÓ \$HE[]IMUR!\$"  2>&1 
>/dev/null |
-grep fixed >debug2 &&
-   test_write_lines "fixed \\^*TILR^AUN:\\.\\* HALLÓ 
\$HE\\[]IMUR!\\\$" >expect2 &&
-   test_cmp expect2 debug2
+   test_write_lines "TILRAUN: Halló Heimur [abc]!" >file3 &&
+   git add file3 &&
+   git grep -i -F "TILRAUN: Halló Heimur [abc]!" file3
 '
 
 test_expect_success REGEX_LOCALE 'pickaxe -i on non-ascii' '
-- 
2.11.0

[PATCH v2 06/29] grep: add a test for backreferences in PCRE patterns

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add a test for backreferences such as (.)\1 in PCRE patterns. This
test ensures that the PCRE_NO_AUTO_CAPTURE option isn't turned
on. Before this change turning it on would break these sort of
patterns, but wouldn't break any tests.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7810-grep.sh | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 8d69113695..daa906b9b0 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1114,6 +1114,13 @@ test_expect_success PCRE 'grep -P -w pattern' '
test_cmp expected actual
 '
 
+test_expect_success PCRE 'grep -P backreferences work (the PCRE 
NO_AUTO_CAPTURE flag is not set)' '
+   git grep -P -h "(?P.)(?P=one)" hello_world >actual &&
+   test_cmp hello_world actual &&
+   git grep -P -h "(.)\1" hello_world >actual &&
+   test_cmp hello_world actual
+'
+
 test_expect_success 'grep -G invalidpattern properly dies ' '
test_must_fail git grep -G "a["
 '
-- 
2.11.0

[PATCH v2 08/29] grep: add tests for --threads=N and grep.threads

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add tests for --threads=N being supplied on the command-line, or when
grep.threads=N being supplied in the configuration.

When the threading support was made run-time configurable in commit
89f09dd34e ("grep: add --threads= option and grep.threads
configuration", 2015-12-15) no tests were added for it.

In developing a change to the grep code I was able to make
'--threads=1 ` segfault, while the test suite still passed. This
change fixes that blind spot in the tests.

In addition to asserting that asking for N threads shouldn't segfault,
test that the grep output given any N is the same.

The choice to test only 1..10 as opposed to 1..8 or 1..16 or whatever
is arbitrary. Testing 1..1024 works locally for me (but gets
noticeably slower as more threads are spawned). Given the structure of
the code there's no reason to test an arbitrary number of threads,
only 0, 1 and >=2 are special modes of operation.

A later patch introduces a PTHREADS test prerequisite which is true
under NO_PTHREADS=UnfortunatelyYes, but even under NO_PTHREADS it's
fine to test --threads=N, we'll just ignore it and not use
threading. So these tests also make sense under that mode to assert
that --threads=N without pthreads still returns expected results.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7810-grep.sh | 16 
 1 file changed, 16 insertions(+)

diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index daa906b9b0..561709ef6a 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -775,6 +775,22 @@ test_expect_success 'grep -W with userdiff' '
test_cmp expected actual
 '
 
+for threads in $(test_seq 0 10)
+do
+   test_expect_success "grep --threads=$threads & -c 
grep.threads=$threads" "
+   git grep --threads=$threads . >actual.$threads &&
+   if test $threads -ge 1
+   then
+   test_cmp actual.\$(($threads - 1)) actual.$threads
+   fi &&
+   git -c grep.threads=$threads grep . >actual.$threads &&
+   if test $threads -ge 1
+   then
+   test_cmp actual.\$(($threads - 1)) actual.$threads
+   fi
+   "
+done
+
 test_expect_success 'grep from a subdirectory to search wider area (1)' '
mkdir -p s &&
(
-- 
2.11.0

[PATCH v2 01/29] Makefile & configure: reword inaccurate comment about PCRE

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Reword an outdated & inaccurate comment which suggests that only
git-grep can use PCRE.

This comment was added back when PCRE support was initially added in
commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09), and was true
at the time.

It hasn't been telling the full truth since git-log learned to use
PCRE with --grep in commit 727b6fc3ed ("log --grep: accept
--basic-regexp and --perl-regexp", 2012-10-03), and more importantly
is likely to get more inaccurate over time as more use is made of PCRE
in other areas.

Reword it to be more future-proof, and to more clearly explain that
this enables user-initiated runtime behavior.

Copy/pasting this so much in configure.ac is lame, these Makefile-like
flags aren't even used by autoconf, just the corresponding
--with[out]-* options. But copy/pasting the comments that make sense
for the Makefile to configure.ac where they make less sense is the
pattern everything else follows in that file. I'm not going to war
against that as part of this change, just following the existing
pattern.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile |  6 --
 configure.ac | 12 
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index e35542e631..eedadb8056 100644
--- a/Makefile
+++ b/Makefile
@@ -24,8 +24,10 @@ all::
 # Define NO_OPENSSL environment variable if you do not have OpenSSL.
 # This also implies BLK_SHA1.
 #
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as log and grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
diff --git a/configure.ac b/configure.ac
index 128165529f..deeb968daa 100644
--- a/configure.ac
+++ b/configure.ac
@@ -250,8 +250,10 @@ AS_HELP_STRING([--with-openssl],[use OpenSSL library 
(default is YES)])
 AS_HELP_STRING([],  [ARG can be prefix for openssl library and 
headers]),
 GIT_PARSE_WITH([openssl]))
 
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as log and grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
@@ -499,8 +501,10 @@ GIT_CONF_SUBST([NEEDS_SSL_WITH_CRYPTO])
 GIT_CONF_SUBST([NO_OPENSSL])
 
 #
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as log and grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 
 if test -n "$USE_LIBPCRE"; then
-- 
2.11.0

[PATCH v2 04/29] log: add exhaustive tests for pattern style options & config

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Add exhaustive tests for how the different grep.patternType options &
the corresponding command-line options affect git-log.

Before this change it was possible to patch revision.c so that the
--basic-regexp option was synonymous with --extended-regexp, and
--perl-regexp wasn't recognized at all, and still have 100% of the
test suite pass.

This was because the first test being modified here, added in commit
34a4ae55b2 ("log --grep: use the same helper to set -E/-F options as
"git grep"", 2012-10-03), didn't actually check whether we'd enabled
extended regular expressions as distinct from re-toggling non-fixed
string support.

Fix that by changing the pattern to a pattern that'll only match if
--extended-regexp option is provided, but won't match under the
default --basic-regexp option.

Other potential regressions were possible since there were no tests
for the rest of the combinations of grep.patternType configuration
toggles & corresponding git-log command-line options. Add exhaustive
tests for those.

The patterns being passed to fixed/basic/extended/PCRE are carefully
crafted to return the wrong thing if the grep engine were to pick any
other matching method than the one it's told to use.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t4202-log.sh | 94 +-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index f577990716..e522a2fcd5 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -262,7 +262,28 @@ test_expect_success 'log --grep -i' '
 
 test_expect_success 'log -F -E --grep= uses ere' '
echo second >expect &&
-   git log -1 --pretty="tformat:%s" -F -E --grep=s.c.nd >actual &&
+   # basic would need \(s\) to do the same
+   git log -1 --pretty="tformat:%s" -F -E --grep="(s).c.nd" >actual &&
+   test_cmp expect actual
+'
+
+test_expect_success PCRE 'log -F -E --perl-regexp --grep= uses PCRE' '
+   test_when_finished "rm -rf num_commits" &&
+   git init num_commits &&
+   (
+   cd num_commits &&
+   test_commit 1d &&
+   test_commit 2e
+   ) &&
+   echo 2e >expect &&
+   # In PCRE \d in [\d] is like saying "0-9", and matches the 2
+   # in 2e...
+   git -C num_commits log -1 --pretty="tformat:%s" -F -E --perl-regexp 
--grep="[\d]" >actual &&
+   test_cmp expect actual &&
+   echo 1d >expect &&
+   # ...in POSIX basic & extended it is the same as [d],
+   # i.e. "d", which matches 1d, but not and does not match 2e.
+   git -C num_commits log -1 --pretty="tformat:%s" -F -E --grep="[\d]" 
>actual &&
test_cmp expect actual
 '
 
@@ -280,6 +301,77 @@ test_expect_success 'log with grep.patternType 
configuration and command line' '
test_cmp expect actual
 '
 
+test_expect_success 'log with various grep.patternType configurations & 
command-lines' '
+   git init pattern-type &&
+   (
+   cd pattern-type &&
+   test_commit 1 file A &&
+
+   # The tagname is overridden here because creating a
+   # tag called "(1|2)" as test_commit would otherwise
+   # implicitly do would fail on e.g. MINGW.
+   test_commit "(1|2)" file B 2 &&
+
+   echo "(1|2)" >expect.fixed &&
+   cp expect.fixed expect.basic &&
+   cp expect.fixed expect.extended &&
+   cp expect.fixed expect.perl &&
+
+   # A strcmp-like match with fixed.
+   git -c grep.patternType=fixed log --pretty=tformat:%s \
+   --grep="(1|2)" >actual.fixed &&
+
+   # POSIX basic matches (, | and ) literally.
+   git -c grep.patternType=basic log --pretty=tformat:%s \
+   --grep="(.|.)" >actual.basic &&
+
+   # POSIX extended needs to have | escaped to match it
+   # literally, whereas under basic this is the same as
+   # (|2), i.e. it would also match "1". This test checks
+   # for extended by asserting that it is not matching
+   # what basic would match.
+   git -c grep.patternType=extended log --pretty=tformat:%s \
+   --grep="\|2" >actual.extended &&
+   if test_have_prereq PCRE
+   then
+   # Only PCRE would match [\d]\| with only
+   # "(1|2)" due to [\d]. POSIX basic would match
+   # both it and "1", and POSIX extended would
+   # match neither.
+   git -c grep.patternType=perl log --pretty=tformat:%s \
+   --grep="[\d]\|" >actual.perl &&
+   test_cmp expect.perl actual.perl
+   fi &&
+   test_cmp expect.fixed actual.fixed &&
+   test_cmp expect.basic actual.basic &&
+   test_cmp

[PATCH v2 00/29] Easy to review grep & pre-PCRE changes

2017-05-13 Thread Ævar Arnfjörð Bjarmason

Easy to review? 29 patches? Are you kidding me?!

As noted in v1 (<20170511091829.5634-1-ava...@gmail.com>;
https://public-inbox.org/git/20170511091829.5634-1-ava...@gmail.com/)
these are all doc, test, refactoring etc. changes needed by the
subsequent "PCRE v2, PCRE v1 JIT, log -P & fixes" series.

Thanks a lot for the review everyone. This fixes all the issues
raised. Changes noted below, with names prefixed by the person who
raised the issue.

Ævar Arnfjörð Bjarmason (29):
  Makefile & configure: reword inaccurate comment about PCRE
  grep & rev-list doc: stop promising libpcre for --perl-regexp
  test-lib: rename the LIBPCRE prerequisite to PCRE

No changes.

  log: add exhaustive tests for pattern style options & config

Johannes: Now doesn't create a "(1|2)" tag, so should work on Windows
& beyond (wasn't needed, just created as a side-effect of test_commit)

Junio: Added comments for tricky basic/extended/perl

Junio: Moved all the 'test_have_prereq PCRE' test / test_cmp code
together, not apart as before.

  grep: add a test asserting that --perl-regexp dies when !PCRE
  grep: add a test for backreferences in PCRE patterns

No changes.

  grep: change non-ASCII -i test to stop using --debug

Brandon: Removed stray leftover unused --debug

  grep: add tests for --threads=N and grep.threads

Brandon: Amended commit message to clarify that this test doesn't need
a NO_PTHREADS prerequisite, and we actually get coverage out of
testing with --threads=N when not with threads, or at least it doesn't
harm anything.

  grep: amend submodule recursion test for regex engine testing

Junio: Now "foo" -> "(1|2)" as the commit message claims, not ->
"(1|2)d".

  grep: add tests for grep pattern types being passed to submodules

No changes.

  grep: add a test helper function for less verbose -f \0 tests

Junio: "nul_match() {" -> "nul_match () {" & quote "$status" & don't
quote non-variable strings.

  grep: prepare for testing binary regexes containing rx metacharacters

No changes.

  grep: add tests to fix blind spots with \0 patterns

Junio: Also fixed quoted variable strings here as above.

  perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do
  perf: emit progress output when unpacking & building

No changes.

  perf: add a performance comparison test of grep -G, -E and -P

All my multibyte performance tests were done with the string
'm(ú|u)ult.b(æ|y)te' which didn't match anything in the kernel, now
done with 'm(ú|u)lt.b(æ|y)te' instead.

I re-ran all the performance tests mentioned in the commit messages
where applicable.

  perf: add a performance comparison of fixed-string grep

One test_cmp was run twice due to rebasing from the pcre1/pcre2 days
of this series. Fixed.

  grep: catch a missing enum in switch statement

Stefan: Removed the comment about die(..BUG) & put the relevant detail
in the commit message instead.

  grep: remove redundant regflags assignment under PCRE
  grep: remove redundant `regflags &= ~REG_EXTENDED` assignments

No changes.

  grep: factor test for \0 in grep patterns into a function

Brandon: Fix comment syntax creating the function, and move it to the
correct place now instead of later in the "move is_fixed()" commit.

  grep: change the internal PCRE macro names to be PCRE1
  grep: change internal *pcre* variable & function names to be *pcre1*

No changes.

  grep: move is_fixed() earlier to avoid forward declaration

Brandon: Now just moves is_fixed() instead of is_fixed() & has_null()

  test-lib: add a PTHREADS prerequisite
  pack-objects & index-pack: add test for --threads warning
  pack-objects: fix buggy warning about threads

No changes.

  grep: given --threads with NO_PTHREADS=YesPlease, warn

Use Git standard comment syntax for TRANSLATORS comment.

  grep: assert that threading is enabled when calling grep_{lock,unlock}

 Documentation/git-grep.txt |   7 +-
 Documentation/rev-list-options.txt |   8 +-
 Makefile   |  14 ++-
 builtin/grep.c |  23 +++-
 builtin/pack-objects.c |   4 +-
 configure.ac   |  12 ++-
 grep.c | 108 ++-
 grep.h |  10 +-
 t/README   |   8 +-
 t/perf/README  |  19 +++-
 t/perf/p7820-grep-engines.sh   |  35 ++
 t/perf/p7821-grep-engines-fixed.sh |  26 +
 t/perf/run |  13 ++-
 t/t4202-log.sh |  96 -
 t/t5300-pack-object.sh |  33 ++
 t/t7008-grep-binary.sh | 135 +--
 t/t7810-grep.sh|  81 +++---
 t/t7812-grep-icase-non-ascii.sh|  29 ++---
 t/t7813-grep-icase-iso.sh  |   2 +-
 t/t7814-grep-recurse-submodules.sh | 215 +++--
 t/test-lib.sh  |   3 +-
 21 files changed, 646 insertions(+), 235 deletions(-)
 create mode 100755

Re: [PATCH] Use https links to Wikipedia to avoid http redirects

2017-05-13 Thread Ævar Arnfjörð Bjarmason

On Sat, May 13, 2017 at 11:54 AM, Sven Strickroth  wrote:
> Signed-off-by: Sven Strickroth 

Thanks! FWIW:

Reviewed-by: Ævar Arnfjörð Bjarmason 

> ---
>  Documentation/gitweb.txt | 2 +-
>  bisect.c | 2 +-
>  gitweb/gitweb.perl   | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/gitweb.txt b/Documentation/gitweb.txt
> index 96156e5e1..88450589a 100644
> --- a/Documentation/gitweb.txt
> +++ b/Documentation/gitweb.txt
> @@ -84,7 +84,7 @@ separator (rules for Perl's "`split(" ", $line)`").
>
>  * Fields use modified URI encoding, defined in RFC 3986, section 2.1
>  (Percent-Encoding), or rather "Query string encoding" (see
> -http://en.wikipedia.org/wiki/Query_string#URL_encoding[]), the difference
> +https://en.wikipedia.org/wiki/Query_string#URL_encoding[]), the difference
>  being that SP (" ") can be encoded as "{plus}" (and therefore "{plus}" has 
> to be
>  also percent-encoded).
>  +
> diff --git a/bisect.c b/bisect.c
> index 08c9fb726..fa477a3e2 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -546,7 +546,7 @@ static unsigned get_prn(unsigned count) {
>
>  /*
>   * Custom integer square root from
> - * http://en.wikipedia.org/wiki/Integer_square_root
> + * https://en.wikipedia.org/wiki/Integer_square_root
>   */
>  static int sqrti(int val)
>  {
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 7cf68f07b..d8209c7a0 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -8085,7 +8085,7 @@ sub git_search_help {
>  Pattern is by default a normal string that is matched 
> precisely (but without
>  regard to case, except in the case of pickaxe). However, when you check the 
> re checkbox,
>  the pattern entered is recognized as the POSIX extended
> -http://en.wikipedia.org/wiki/Regular_expression;>regular 
> expression (also case
> +https://en.wikipedia.org/wiki/Regular_expression;>regular 
> expression (also case
>  insensitive).
>  
>  commit
> --
> 2.12.1.windows.1
>

Re: [PATCH] compat/regex: fix compilation on Windows

2017-05-13 Thread Ævar Arnfjörð Bjarmason

On Sat, May 13, 2017 at 8:30 PM, Johannes Schindelin
 wrote:
> Hi Ævar,
>
> I originally replied in a very verbose manner, going step by step through
> the "one-liner", but decided to rephrase everything.
>
> So here goes.
>
> On Sat, 13 May 2017, Ævar Arnfjörð Bjarmason wrote:
>
>> Let's drop this current gawk import series.
>
> Well, the reason why you imported the current gawk regex is that we (you?)
> decided originally that it'd be easier to go with that one rather than
> with the gnulib one (which they friendly-forked). If you switch to gnulib,
> I would like to see (in the commit message) the original reason we picked
> gawk (which forked gnulib's regex code, after all), and why that reason no
> longer applies.

It was just easier to extract the code from gawk than gnulib at the
time. The thread starting at <20100715220059.GA3312@burratino> has
some details.

> I also would strongly prefer a *non* one-liner, in a commit message that
> adds the relevant source code from gawk or gnulib *verbatim*, followed by
> patches that adjust the code to Git's needs. That way, a future update can
> repeat the commands outlined in the first commit message and then
> cherry-pick the subsequent patches.
>
> And remember that you do not need to clone the entire repository. You can
> 1) fetch into the current one, and 2) use a shallow fetch. Example:
>
> git fetch --depth 1 http://git.savannah.gnu.org/r/gawk.git \
> gawk-4.1.4
>
> The next commands could be something like
>
> git read-tree --prefix=compat/regex/ FETCH_HEAD:
> git clean -dfx -- compat/regex/

Sure, if I submit this again I'll update whatever one liner is in
there to import it via git.

> Oh, and please do not use `master`. If Git is any indication, a tagged
> release will most often be a bit more robust than any in-between commit.

The reason to use the master branch of gawk is that since the last
release only one trivial change has been made to it, but the files
were all renamed as well, so it was easier for that one-liner in the
README to use master than the latest release.

If we move to gnulib we'll also use the master branch, because that's
what you're supposed to do according to their docs, they don't make
releases.

Re: [PATCH] fixup! log: add exhaustive tests for pattern style options & config

2017-05-13 Thread Johannes Schindelin

Hi Jonathan,

On Fri, 12 May 2017, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
> 
> > On Windows, `(1|2)` is not a valid file name, and therefore the tag
> > cannot be created as expected by the new test.
> >
> > So simply skip this test on Windows.
> >
> > Signed-off-by: Johannes Schindelin 
> > ---
> 
> Reviewed-by: Jonathan Nieder 
> 
> I wouldn't be surprised if there are filesystems used places other
> than MINGW that also can't handle this test.  Isn't this what some
> tests use a FUNNYNAMES prerequisite for?

Some tests do that. And they all implement that prereq themselves,
differently.

t3600 tries to create a file whose name contains a tab.

t4135 tries to create a file whose name contains a tab, but only if the
MINGW prereq is set.

t9903 tries to create a file whose name contains a newline.

It seems, therefore, that the existing FUNNYNAMES are already in
disagreement, and as you point out below, the file name in question
contains neither a tab nor a newline. So we would introduce *yet another*
disagreeing FUNNYNAMES.

Besides, in some hopefully not so far future, our default refs backend may
not depend on the current platform's file system's idionsyncracies. So
maybe we do not even want the MINGW nor the FUNNYNAMES prereq...

> In this example, it's the pipe that's not allowed, not the
> parenthesis, right?  (At least I have some memories of naming files
> with some parentheses.)

You would be correct in that assumptions. Long names on NTFS can contain
parentheses.

> Would something like
> 
>   test PIPE_IN_FILENAME '
>   >"a|b" &&
>   test -f "a|b"
>   '
> 
> work?

No.

Remember: on Windows, there is no Unix shell.

(Actually, there is Bash on Ubuntu on Windows, and with the Creators
Update, it became truly awesome, I am a big fan of it. But that is besides
the point here: Git for Windows needs to support Windows versions older
than Windows 10, where Bash on Ubuntu on Windows is simply unavailable.)

So what Git for Windows has to do is quite a burden: we ship with MSYS2,
kind of a light-weight version of Cygwin, to run our shell scripts. (I
would wager a bet that at least a third of the bug reports relate to the
POSIX emulation layer we ship to allow us to run Bash and Perl, but even
that is only half the truth: tracking down those bugs takes easily the
majority of my bug hunting/fixing time, hands down.)

The test suite is no exception. I may have ranted about this about six
dozen times already, so I'll save me the time. Just pretend that you now
read seven long paragraphs describing how much it sucks to run the test
suite on Windows.

Take home message: Unix shell scripting is good for personal use. Don't
use it in applications, not if you want to stay portable. Just don't.

Back to the subject: The MSYS2 emulation layer inherits a neat trick from
Cygwin, where it *can* create file names containing pipe symbols. They
will be quietly mapped into a private UTF-8 page, and when Cygwin or MSYS2
read the file back, the file name maps from this page back to ASCII
transparently.

That strategy is all good and dandy, as long as you stay within the POSIX
emulation layer.

Git for Windows avoids the POSIX emulation layer as much as possible, for
speed, and also for robustness.

Which means that Git does *not* map the file names using said private
UTF-8 code page. And therefore, your test would succeed (because the shell
script would stay within the POSIX emulation layer, which creates that
file using above-mentioned strategy), but Git (being a regular Win32
program) *still* would fail to create said file.

Ciao,
Dscho

Re: [PATCH] compat/regex: fix compilation on Windows

2017-05-13 Thread Johannes Schindelin

Hi Ævar,

I originally replied in a very verbose manner, going step by step through
the "one-liner", but decided to rephrase everything.

So here goes.

On Sat, 13 May 2017, Ævar Arnfjörð Bjarmason wrote:

> Let's drop this current gawk import series.

Well, the reason why you imported the current gawk regex is that we (you?)
decided originally that it'd be easier to go with that one rather than
with the gnulib one (which they friendly-forked). If you switch to gnulib,
I would like to see (in the commit message) the original reason we picked
gawk (which forked gnulib's regex code, after all), and why that reason no
longer applies.

I also would strongly prefer a *non* one-liner, in a commit message that
adds the relevant source code from gawk or gnulib *verbatim*, followed by
patches that adjust the code to Git's needs. That way, a future update can
repeat the commands outlined in the first commit message and then
cherry-pick the subsequent patches.

And remember that you do not need to clone the entire repository. You can
1) fetch into the current one, and 2) use a shallow fetch. Example:

git fetch --depth 1 http://git.savannah.gnu.org/r/gawk.git \
gawk-4.1.4

The next commands could be something like

git read-tree --prefix=compat/regex/ FETCH_HEAD:
git clean -dfx -- compat/regex/

Oh, and please do not use `master`. If Git is any indication, a tagged
release will most often be a bit more robust than any in-between commit.

Thanks,
Dscho

Re: [PATCH 04/29] log: add exhaustive tests for pattern style options & config

2017-05-13 Thread Ævar Arnfjörð Bjarmason

On Fri, May 12, 2017 at 6:48 AM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> Add exhaustive tests for how the different grep.patternType options &
>> the corresponding command-line options affect git-log.
>> ...
>> The patterns being passed to fixed/basic/extended/PCRE are carefully
>> crafted to return the wrong thing if the grep engine were to pick any
>> other matching method than the one it's told to use.
>
> That is good; it may be even more helpful to the readers if they are
> given a brief in-code comment.  I had to scratch head while reading
> them.
>
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>  t/t4202-log.sh | 77 
>> +-
>>  1 file changed, 76 insertions(+), 1 deletion(-)
>>
>> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
>> index f577990716..6d1411abea 100755
>> --- a/t/t4202-log.sh
>> +++ b/t/t4202-log.sh
>> @@ -262,7 +262,23 @@ test_expect_success 'log --grep -i' '
>>
>>  test_expect_success 'log -F -E --grep= uses ere' '
>>   echo second >expect &&
>> - git log -1 --pretty="tformat:%s" -F -E --grep=s.c.nd >actual &&
> # BRE would need \(s\) to do the same.
>> + git log -1 --pretty="tformat:%s" -F -E --grep="(s).c.nd" >actual &&
>> + test_cmp expect actual
>> +'
>> +
>> +test_expect_success PCRE 'log -F -E --perl-regexp --grep= uses PCRE' '
>> + test_when_finished "rm -rf num_commits" &&
>> + git init num_commits &&
>> + (
>> + cd num_commits &&
>> + test_commit 1d &&
>> + test_commit 2e
>> + ) &&
>> + echo 2e >expect &&
> # In PCRE \d in [\d] is like saying "0-9", and match '2' in 2e
>> + git -C num_commits log -1 --pretty="tformat:%s" -F -E --perl-regexp 
>> --grep="[\d]" >actual &&
>> + test_cmp expect actual &&
>> + echo 1d >expect &&
> # In ERE [\d] is bs or letter 'd' and would not match '2' in '2e'
> # but does match 'd' in '1d'
>> + git -C num_commits log -1 --pretty="tformat:%s" -F -E --grep="[\d]" 
>> >actual &&
>>   test_cmp expect actual
>>  '
>>
>> @@ -280,6 +296,65 @@ test_expect_success 'log with grep.patternType 
>> configuration and command line' '
>>   test_cmp expect actual
>>  '
>>
>> +test_expect_success 'log with various grep.patternType configurations & 
>> command-lines' '
>> + git init pattern-type &&
>> + (
>> + cd pattern-type &&
>> + test_commit 1 file A &&
>> + test_commit "(1|2)" file B &&
>> +
>> + echo "(1|2)" >expect.fixed &&
>> + cp expect.fixed expect.basic &&
>> + cp expect.fixed expect.extended &&
>> + cp expect.fixed expect.perl &&
>> +
> # Fixed finds these literally
>> + git -c grep.patternType=fixed log --pretty=tformat:%s \
>> + --grep="(1|2)" >actual.fixed &&
> # BRE matches (, |, and ) literally
>> + git -c grep.patternType=basic log --pretty=tformat:%s \
>> + --grep="(.|.)" >actual.basic &&
> # ERE needs | quoted with bs to match | literally
>> + git -c grep.patternType=extended log --pretty=tformat:%s \
>> + --grep="\|2" >actual.extended &&
>
> If we use BRE by mistake, wouldn't this pattern actually find
> "(1|2)" anyway, without skipping it and show "1 file A" instead?

It'll find (1|2) but also 1, i.e.:

$ (echo 1; echo "(1|2)") >/tmp/f; for t in G E; do echo $t: && grep
-$t '\|2' /tmp/f | sed 's/^/  /'; done
G:
  1
  (1|2)
E:
  (1|2)

So the test will fail under basic. I'll add comments about this & the
other things you suggested.

>> + if test_have_prereq PCRE
>> + then
>> + git -c grep.patternType=perl log --pretty=tformat:%s \
>> + --grep="[\d]\|" >actual.perl
> # Only PCRE would match [\d]\| with "(1|2)" due to 
> [\d]
>> + fi &&
>> + test_cmp expect.fixed actual.fixed &&
>> + test_cmp expect.basic actual.basic &&
>> + test_cmp expect.extended actual.extended &&
>> + if test_have_prereq PCRE
>> + then
>> + test_cmp expect.perl actual.perl
>> + fi &&
>
> It could be just a style thing, but I would actually find it easier
> to follow if the above two "only with PCRE" tests, i.e. running test
> and taking its output in actual.perl and comparing it with
> expect.perl, were inside a single "if test_have_prereq PCRE" block.

Sure, will fix.

Re: [PATCH v4 03/12] t/test-lib-functions.sh: allow to specify the tag name to test_commit

2017-05-13 Thread Ævar Arnfjörð Bjarmason

On Tue, Feb 12, 2013 at 11:17 AM, Brandon Casey  wrote:
> The  part of test_commit() may not be appropriate for a tag name.
> So let's allow test_commit to accept a fourth argument to specify the tag
> name.

[Kind of late to notice, I know]

I see nobody spotted in four rounds of reviews that this commit didn't
update the corresponding t/README docs for test_commit.

> Signed-off-by: Brandon Casey 
> Reviewed-by: Jonathan Nieder 
> ---
>  t/test-lib-functions.sh | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index fa62d01..61d0804 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -135,12 +135,12 @@ test_pause () {
> fi
>  }
>
> -# Call test_commit with the arguments " [ []]"
> +# Call test_commit with the arguments " [ [ 
> []]]"
>  #
>  # This will commit a file with the given contents and the given commit
> -# message.  It will also add a tag with  as name.
> +# message, and tag the resulting commit with the given tag name.
>  #
> -# Both  and  default to .
> +# , , and  all default to .
>
>  test_commit () {
> notick= &&
> @@ -168,7 +168,7 @@ test_commit () {
> test_tick
> fi &&
> git commit $signoff -m "$1" &&
> -   git tag "$1"
> +   git tag "${4:-$1}"
>  }
>
>  # Call test_merge with the arguments " ", where 
> --
> 1.8.1.3.579.gd9af3b6
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/29] grep: amend submodule recursion test for regex engine testing

2017-05-13 Thread Ævar Arnfjörð Bjarmason

On Fri, May 12, 2017 at 6:59 AM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> Amend the submodule recursion test to prepare it for subsequent tests
>> of whether it passes along the grep.patternType to the submodule
>> greps.
>>
>> This is the result of searching & replacing:
>>
>> foobar -> (1|2)d(3|4)
>> foo-> (1|2)
>> bar-> (3|4)
>> ...
>>  test_expect_success 'grep and multiple patterns' '
>>   cat >expect <<-\EOF &&
>> - b/b:bar
>> + b/b:(3|4)
>>   EOF
>>
>> - git grep -e "bar" --and --not -e "foo" --recurse-submodules >actual &&
>> + git grep -e "(3|4)" --and --not -e "(1|2)d" --recurse-submodules 
>> >actual &&
>
>
> This breaks the promise "foo maps to (1|2)"; I do not think you need
> to add 'd' in order to make the test to succeed, so I am not sure
> what is going on here.

Thanks, fixing it. That was just a stupid mistake on my part, don't
know how that snuck in there, must have just fat-fingered (1|2) as
(1|2)d during interactive replace.

[PATCH 5/5] p0004: don't error out if test repo is too small

2017-05-13 Thread René Scharfe

Repositories with less than 4000 entries are always handled using a
single thread, causing test-lazy-init-name-hash --multi to error out.
Don't abort the whole test script in that case, but simply skip the
multi-threaded performance check.  We can still use it to compare the
single-threaded speed of different versions in that case.

Signed-off-by: Rene Scharfe 
---
 t/perf/p0004-lazy-init-name-hash.sh | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/t/perf/p0004-lazy-init-name-hash.sh 
b/t/perf/p0004-lazy-init-name-hash.sh
index 3c2135a185..8de5a98cfc 100755
--- a/t/perf/p0004-lazy-init-name-hash.sh
+++ b/t/perf/p0004-lazy-init-name-hash.sh
@@ -8,10 +8,13 @@ test_checkout_worktree
 
 test_expect_success 'verify both methods build the same hashmaps' '
test-lazy-init-name-hash --dump --single >out.single &&
-   test-lazy-init-name-hash --dump --multi >out.multi &&
-   sort sorted.single &&
-   sort sorted.multi &&
-   test_cmp sorted.single sorted.multi
+   if test-lazy-init-name-hash --dump --multi >out.multi
+   then
+   test_set_prereq REPO_BIG_ENOUGH_FOR_MULTI &&
+   sort sorted.single &&
+   sort sorted.multi &&
+   test_cmp sorted.single sorted.multi
+   fi
 '
 
 test_expect_success 'calibrate' '
@@ -46,7 +49,7 @@ test_perf "single-threaded, $desc" "
test-lazy-init-name-hash --single --count=$count
 "
 
-test_perf "multi-threaded, $desc" "
+test_perf REPO_BIG_ENOUGH_FOR_MULTI "multi-threaded, $desc" "
test-lazy-init-name-hash --multi --count=$count
 "
 
-- 
2.12.2

[PATCH 4/5] p0004: don't abort if multi-threaded is too slow

2017-05-13 Thread René Scharfe

If the single-threaded variant beats the multi-threaded one then we may
have a performance bug, but that doesn't justify aborting the test.
Drop that check; we can compare the results for --single and --multi
using the actual performance tests.

Signed-off-by: Rene Scharfe 
---
 t/perf/p0004-lazy-init-name-hash.sh | 4 
 1 file changed, 4 deletions(-)

diff --git a/t/perf/p0004-lazy-init-name-hash.sh 
b/t/perf/p0004-lazy-init-name-hash.sh
index d30c32f97b..3c2135a185 100755
--- a/t/perf/p0004-lazy-init-name-hash.sh
+++ b/t/perf/p0004-lazy-init-name-hash.sh
@@ -14,10 +14,6 @@ test_expect_success 'verify both methods build the same 
hashmaps' '
test_cmp sorted.single sorted.multi
 '
 
-test_expect_success 'multithreaded should be faster' '
-   test-lazy-init-name-hash --perf >out.perf
-'
-
 test_expect_success 'calibrate' '
entries=$(wc -l

[PATCH 3/5] p0004: use test_perf

2017-05-13 Thread René Scharfe

The perf test suite (more specifically: t/perf/aggregate.perl) requires
each test script to write test results into a file, otherwise it aborts
when aggregating.  Add actual performance tests with test_perf to allow
p0004 to be run together with other perf scripts.

Calibrate the value for the parameter --count based on the size of the
test repository, in order to get meaningful results with smaller repos
yet still be able to finish the script against huge ones without having
to wait for hours.

Signed-off-by: Rene Scharfe 
---
The numbers are just guesses; I didn't actually test all ranges.

 t/perf/p0004-lazy-init-name-hash.sh | 36 
 1 file changed, 36 insertions(+)

diff --git a/t/perf/p0004-lazy-init-name-hash.sh 
b/t/perf/p0004-lazy-init-name-hash.sh
index 576bdc3c4e..d30c32f97b 100755
--- a/t/perf/p0004-lazy-init-name-hash.sh
+++ b/t/perf/p0004-lazy-init-name-hash.sh
@@ -18,4 +18,40 @@ test_expect_success 'multithreaded should be faster' '
test-lazy-init-name-hash --perf >out.perf
 '
 
+test_expect_success 'calibrate' '
+   entries=$(wc -l

[PATCH 2/5] p0004: avoid using pipes

2017-05-13 Thread René Scharfe

The return code of commands on the producing end of a pipe is ignored.
Evaluate the outcome of test-lazy-init-name-hash by calling sort
separately.

Signed-off-by: Rene Scharfe 
---
 t/perf/p0004-lazy-init-name-hash.sh | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/t/perf/p0004-lazy-init-name-hash.sh 
b/t/perf/p0004-lazy-init-name-hash.sh
index 716c951553..576bdc3c4e 100755
--- a/t/perf/p0004-lazy-init-name-hash.sh
+++ b/t/perf/p0004-lazy-init-name-hash.sh
@@ -7,9 +7,11 @@ test_perf_large_repo
 test_checkout_worktree
 
 test_expect_success 'verify both methods build the same hashmaps' '
-   test-lazy-init-name-hash --dump --single | sort >out.single &&
-   test-lazy-init-name-hash --dump --multi  | sort >out.multi  &&
-   test_cmp out.single out.multi
+   test-lazy-init-name-hash --dump --single >out.single &&
+   test-lazy-init-name-hash --dump --multi >out.multi &&
+   sort sorted.single &&
+   sort sorted.multi &&
+   test_cmp sorted.single sorted.multi
 '
 
 test_expect_success 'multithreaded should be faster' '
-- 
2.12.2

[PATCH 1/5] p0004: simplify calls of test-lazy-init-name-hash

2017-05-13 Thread René Scharfe

The test library puts helpers into $PATH, so we can simply call them
without specifying their location.

The suffix $X is also not necessary because .exe files on Windows can be
started without specifying their extension, and on other platforms it's
empty anyway.

Signed-off-by: Rene Scharfe 
---
 t/perf/p0004-lazy-init-name-hash.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/perf/p0004-lazy-init-name-hash.sh 
b/t/perf/p0004-lazy-init-name-hash.sh
index 5afa8c8df3..716c951553 100755
--- a/t/perf/p0004-lazy-init-name-hash.sh
+++ b/t/perf/p0004-lazy-init-name-hash.sh
@@ -7,13 +7,13 @@ test_perf_large_repo
 test_checkout_worktree
 
 test_expect_success 'verify both methods build the same hashmaps' '
-   $GIT_BUILD_DIR/t/helper/test-lazy-init-name-hash$X --dump --single | 
sort >out.single &&
-   $GIT_BUILD_DIR/t/helper/test-lazy-init-name-hash$X --dump --multi  | 
sort >out.multi  &&
+   test-lazy-init-name-hash --dump --single | sort >out.single &&
+   test-lazy-init-name-hash --dump --multi  | sort >out.multi  &&
test_cmp out.single out.multi
 '
 
 test_expect_success 'multithreaded should be faster' '
-   $GIT_BUILD_DIR/t/helper/test-lazy-init-name-hash$X --perf >out.perf
+   test-lazy-init-name-hash --perf >out.perf
 '
 
 test_done
-- 
2.12.2

[PATCH 0/5] p0004: support being called by t/perf/run

2017-05-13 Thread René Scharfe

p0004-lazy-init-name-hash.sh errors out if the test repo is too small,
and doesn't generate any perf test results even if it finishes
successfully.  That prevents t/perf/run from running the whole test
suite.  This series tries to address these issues.

  p0004: simplify calls of test-lazy-init-name-hash
  p0004: avoid using pipes
  p0004: use test_perf
  p0004: don't abort if multi-threaded is too slow
  p0004: don't error out if test repo is too small

 t/perf/p0004-lazy-init-name-hash.sh | 47 +
 1 file changed, 42 insertions(+), 5 deletions(-)

-- 
2.12.2

1 2 >

1 - 100 of 120 matches

Mail list logo