-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Samir Wadhawan on 1/31/2008 2:41 PM: | Dear Mike Haertel,
Coreutils is maintained by more than just Mike (for that matter, it has been years since Mike made any contributions, according to the ChangeLog). | As indicated in the join's manpage, we ensured that the columns on | which the join was being produced were sorted using these commands before | the join was conducted: | | sort -k 5 file1 > file1.srt | sort -k 1 file2 > file2.srt | | Surprisingly we notice that join proceeds WITHOUT errors when we use this | variant of sort: | | sort -k 5,5 file1 > file1.srt | sort -k 1,1 file2 > file2.srt Thanks for the report, however, this is probably not a bug, but a locale issue. "sort -k 5 file1" is different than "sort -k 5,5 file1". One sorts by characters starting in the fifth field, and going to the end of the line, while the other sorts only by the fifth field. Depending on your current LC_COLLATE settings, this may be significant. In both cases, since your input file1 had repeats in field 5, it means that sort must fall back on the entire line to resolve lines that otherwise compare equal. Also, since you didn't use -b for sort, the leading blanks figure into the key, which may impact which lines compare equal. | | Clearly, the only difference between the above two variants of sort command is | the additional sorting order of the columns following the ones on which the | sort is being generated. This behaviour puzzles us as the join seems to be | producing | different (inconsistent) outputs, and appears to be sensitive to the sorting | order of other columns in the file. Join can only produce consistent outputs if the input is consistent; it appears that your sorting is not consistent enough for join's purposes, for your given locale settings. | | We tried to reproduce this behaviour on an AIX machine, but find that | both the variants of sorted files produces consistent | join results. Most likely because it had different locale settings. http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021 - -- Don't work too hard, make some time for fun as well! Eric Blake [EMAIL PROTECTED] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHoopr84KuGfSFAYARAjZMAJ46DpbuO5BTE3+ajTQIgGuoahwCFgCeMEn3 KFIq50tdYkD3zkPrhKBu/hg= =xmsx -----END PGP SIGNATURE----- _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils