Well, after my brave words about xmemcoll0 unlikely to be misused in 'sort' I found a misuse of it. With sort -u, write_bytes replaces the trailing NUL with a newline but it doesn't change it back after writing. In some cases "sort -u" will output a line and then later use it as an argument for comparison, which will mess up if the trailing NUL has been replaced. I pushed the following patch to work around the problem.
Chen, can you please verify that "sort -u" does not access the same line from multiple threads, even saved lines? If it does, then even this patch is not enough: we would need to alter write_bytes so that it does not modify its argument at all. This patch also tunes keycompare slightly. I should have broken it up into a different patch, but it sort of snuck in. By the way, any objection if I put 'const' after the types that it qualifies, e.g., "char const *" rather than "const char *"? That was the usual style here. Thanks. >From 5a31febb1b363d9a3a59ed60d5f2051d520dd407 Mon Sep 17 00:00:00 2001 From: Paul R. Eggert <[email protected]> Date: Thu, 15 Jul 2010 16:24:03 -0700 Subject: [PATCH] sort: fix a bug with sort -u and xmemcoll0, and tune keycompare * src/sort.c (keycompare): Use xmemcoll0, as it avoids a couple of stores. (write_bytes): Leave the buffer the way we found it, as it might be used again for a later comparison, if -u is used. --- src/sort.c | 12 +++++++----- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/src/sort.c b/src/sort.c index 7d31878..960df74 100644 --- a/src/sort.c +++ b/src/sort.c @@ -2497,7 +2497,9 @@ keycompare (const struct line *a, const struct line *b, bool show_debug) } } - diff = xmemcoll (copy_a, new_len_a, copy_b, new_len_b); + copy_a[new_len_a++] = '\0'; + copy_b[new_len_b++] = '\0'; + diff = xmemcoll0 (copy_a, new_len_a, copy_b, new_len_b); if (sizeof buf < size) free (copy_a); @@ -2664,13 +2666,11 @@ write_bytes (struct line const *line, FILE *fp, char const *output_file) { char *buf = line->text; size_t n_bytes = line->length; - - *(buf + n_bytes - 1) = eolchar; + char *ebuf = buf + n_bytes; /* Convert TABs to '>' and \0 to \n when -z specified. */ if (debug && fp == stdout) { - char const *ebuf = buf + n_bytes; char const *c = buf; while (c < ebuf) @@ -2678,7 +2678,7 @@ write_bytes (struct line const *line, FILE *fp, char const *output_file) char wc = *c++; if (wc == '\t') wc = '>'; - else if (wc == 0 && eolchar == 0) + else if (c == ebuf) wc = '\n'; if (fputc (wc, fp) == EOF) die (_("write failed"), output_file); @@ -2688,8 +2688,10 @@ write_bytes (struct line const *line, FILE *fp, char const *output_file) } else { + ebuf[-1] = eolchar; if (fwrite (buf, 1, n_bytes, fp) != n_bytes) die (_("write failed"), output_file); + ebuf[-1] = '\0'; } } -- 1.7.1
