In my response cited below: On 28-May-10 09:55:36, Ted Harding wrote: > I suspect the result (in Linux, I can't test this on Windows) > may be related to the following phenomenon: > > sort(c("AB CD","ABCD")) > # [1] "ABCD" "AB CD" > sort(c("AB CD","ABCD ")) > # [1] "AB CD" "ABCD " > > I.e. "ABCD" precedes "AB CD" apparently because it is shorter, > despite the fact that it would come later in an alphabetical sort. > If I use the Linux 'sort' command (on the same machine) I get: > > sort << EOT > "AB CD" > "ABCD" > EOT > "AB CD" > "ABCD" > > sort << EOT > "AB CD" > "ABCD " > EOT > "AB CD" > "ABCD " > > I.e. the same result for either case. In my view the R result is > anomalous! In ?Comparison it is stated that characters are translated > to UTF8 before conparison is done; so a possible explanation could > be that the UTF8 encoding for SPACE (for all I know) may be greater > than that for the letters of the alphabet (as opposed to ASCII, where > -- I do know -- it is less). And, if that is the case, why doesn't it > apply also in Windows? This strikes me as a nasty little trap! > > Ted.
Please ignore the stuff about UTF8 -- the reasoning is false! (since then "ABCD" and "ABCD " would always precede "AB CD"). I.e. read it as: I.e. the same result for either case. In my view the R result is anomalous! This strikes me as a nasty little trap! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 28-May-10 Time: 11:05:44 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.