P Kensche <[EMAIL PROTECTED]> writes: > is sorted by "sort -k 1 a"
In general, that's not correct since it sorts by fields 1 through N, whereas 'join' sorts only by field 1. You need to use "sort -k 1b,1" instead. So, as far as I can tell, you haven't found a bug. However, `-k 1b,1' isn't immediately obvious, and the documentation should be improved here. I installed the following patch to try to improve things. Thanks for reporting the problem. 2006-02-20 Paul Eggert <[EMAIL PROTECTED]> * doc/coreutils.texi (join invocation): Mention `sort -k 1b,1'. * src/join.c (usage): Likewise. Documentation problem reported by Philip Kensche. --- doc/coreutils.texi 20 Feb 2006 16:50:11 -0000 1.313 +++ doc/coreutils.texi 21 Feb 2006 02:50:39 -0000 @@ -4738,11 +4738,11 @@ lines that have identical join fields. join [EMAIL PROTECTED]@dots{} @var{file1} @var{file2} @end example [EMAIL PROTECTED] LC_COLLATE Either @var{file1} or @var{file2} (but not both) can be @samp{-}, meaning standard input. @var{file1} and @var{file2} should be sorted on the join fields. [EMAIL PROTECTED] LC_COLLATE Normally, the sort order is that of the collating sequence specified by the @env{LC_COLLATE} locale. Unless the @option{-t} option is given, the sort comparison ignores blanks at @@ -4750,7 +4750,14 @@ the start of the join field, as in @code @option{--ignore-case} option is given, the sort comparison ignores the case of characters in the join field, as in @code{sort -f}. -However, as a GNU extension, if the input has no unpairable lines the +The @command{sort} and @command{join} commands should use consistent +locales and options if the output of @command{sort} is fed to [EMAIL PROTECTED] You can use a command like @samp{sort -k 1b,1} to +sort a file on its default join field, but if you select a non-default +locale, join field, separator, or comparison options, then you should +do so consistently between @command{join} and @command{sort}. + +As a GNU extension, if the input has no unpairable lines the sort order can be any order that considers two fields to be equal if and only if the sort comparison described above considers them to be equal. For example: @@ -4841,6 +4848,8 @@ option---are subject to the specified @v @item -t @var{char} Use character @var{char} as the input and output field separator. Treat as significant each occurrence of @var{char} in the input file. +Use @samp{sort -t @var{char}}, without the @option{-b} option of [EMAIL PROTECTED], to produce this ordering. @item -v @var{file-number} Print a line for each unpairable line in file @var{file-number} --- src/join.c 18 Feb 2006 07:22:01 -0000 1.144 +++ src/join.c 21 Feb 2006 02:50:40 -0000 @@ -167,6 +167,7 @@ the remaining fields from FILE1, the rem separated by CHAR.\n\ \n\ Important: FILE1 and FILE2 must be sorted on the join fields.\n\ +E.g., use `sort -k 1b,1' if `join' has no options.\n\ "), stdout); printf (_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT); } _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils