Andries E. Brouwer wrote: > uniq(1) says > > Discard all but one of successive identical lines from INPUT > > However, this is very misleading. "Identical" does not mean identical > but "equal if one ignores differences that LC_COLLATE says should be ignored". > > This man page line should be changed, adding a reference to the locale. > As it is now, the words locale and LC_COLLATE do not occur on the man page. > > The info file is better and mentions LC_COLLATE. > But also there the fact that the meanings of "repeated" and "duplicate" > are modified by LC_COLLATE is not mentioned explicitly. > > Andries
How about the attached? > (Sorting is an operation done on all kinds of data, not only lines of text. > I would not mind an option that tells sort to ignore the locale rules for > sorting because what is sorted is not text. That feels cleaner than > preceding each invocation with LC_COLLATE=C. And locale-free sort also > is much faster.) Well it is a very common issue. http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021 I'm not sure there is a better solution than what we have though. cheers, Pádraig.
>From 14d5f083fc6ed571ca0c07e51e7d4365c1ddcd91 Mon Sep 17 00:00:00 2001 From: =?utf-8?q?P=C3=A1draig=20Brady?= <[email protected]> Date: Tue, 5 May 2009 12:00:15 +0100 Subject: [PATCH] doc: note the use of LC_COLLATE in comm, join and uniq. * doc/coreutils.texi (uniq invocation): Simplify the text to remove the inconsequential mentioning of order, while implying that LC_COLLATE can alter equality comparisons. * src/comm.c (usage): Mention LC_COLLATE is significant. * src/join.c (usage): Ditto * src/uniq.c (usage): Ditto. Also improve the summary. Suggestion from Andries Brouwer --- doc/coreutils.texi | 4 ++-- src/comm.c | 4 ++++ src/join.c | 1 + src/uniq.c | 7 +++++-- 4 files changed, 12 insertions(+), 4 deletions(-) diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 918f44e..b96fdb2 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -4406,8 +4406,8 @@ duplicate lines, perhaps you want to use @code{sort -u}. @xref{sort invocation}. @vindex LC_COLLATE -Comparisons use the character collating sequence specified by the -...@env{lc_collate} locale category. +Comparisons honor the rules specified by the @env{LC_COLLATE} +locale category. If no @var{output} file is specified, @command{uniq} writes to standard output. diff --git a/src/comm.c b/src/comm.c index c60936f..3c5b09a 100644 --- a/src/comm.c +++ b/src/comm.c @@ -129,6 +129,10 @@ and column three contains lines common to both files.\n\ "), stdout); fputs (HELP_OPTION_DESCRIPTION, stdout); fputs (VERSION_OPTION_DESCRIPTION, stdout); + fputs (_("\ +\n\ +Note, comparisons honor the rules specified by `LC_COLLATE'.\n\ +"), stdout); emit_bug_reporting_address (); } exit (status); diff --git a/src/join.c b/src/join.c index 992a357..c716698 100644 --- a/src/join.c +++ b/src/join.c @@ -204,6 +204,7 @@ separated by CHAR.\n\ \n\ Important: FILE1 and FILE2 must be sorted on the join fields.\n\ E.g., use `sort -k 1b,1' if `join' has no options.\n\ +Note, comparisons honor the rules specified by `LC_COLLATE'.\n\ If the input is not sorted and some lines cannot be joined, a\n\ warning message will be given.\n\ "), stdout); diff --git a/src/uniq.c b/src/uniq.c index a3e0fb7..f9b4342 100644 --- a/src/uniq.c +++ b/src/uniq.c @@ -135,8 +135,10 @@ Usage: %s [OPTION]... [INPUT [OUTPUT]]\n\ "), program_name); fputs (_("\ -Discard all but one of successive identical lines from INPUT (or\n\ -standard input), writing to OUTPUT (or standard output).\n\ +Filter adjacent matching lines from INPUT (or standard input),\n\ +writing to OUTPUT (or standard output).\n\ +\n\ +With no options, matching lines are merged to the first occurence.\n\ \n\ "), stdout); fputs (_("\ @@ -170,6 +172,7 @@ characters. Fields are skipped before chars.\n\ \n\ Note: 'uniq' does not detect repeated lines unless they are adjacent.\n\ You may want to sort the input first, or use `sort -u' without `uniq'.\n\ +Also, comparisons honor the rules specified by `LC_COLLATE'.\n\ "), stdout); emit_bug_reporting_address (); } -- 1.5.3.6
_______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
