In my response cited below:

On 28-May-10 09:55:36, Ted Harding wrote:
> I suspect the result (in Linux, I can't test this on Windows)
> may be related to the following phenomenon:
> 
>   sort(c("AB CD","ABCD"))
>   # [1] "ABCD"  "AB CD"
>   sort(c("AB CD","ABCD "))
>   # [1] "AB CD" "ABCD "
> 
> I.e. "ABCD" precedes "AB CD" apparently because it is shorter,
> despite the fact that it would come later in an alphabetical sort.
> If I use the Linux 'sort' command (on the same machine) I get:
> 
> sort << EOT
> "AB CD"
> "ABCD"
> EOT
> "AB CD"
> "ABCD"
> 
> sort << EOT
> "AB CD"
> "ABCD "
> EOT
> "AB CD"
> "ABCD "
> 
> I.e. the same result for either case. In my view the R result is
> anomalous! In ?Comparison it is stated that characters are translated
> to UTF8 before conparison is done; so a possible explanation could
> be that the UTF8 encoding for SPACE (for all I know) may be greater
> than that for the letters of the alphabet (as opposed to ASCII, where
> -- I do know -- it is less). And, if that is the case, why doesn't it
> apply also in Windows? This strikes me as a nasty little trap!
> 
> Ted.

Please ignore the stuff about UTF8 -- the reasoning is false!
(since then "ABCD" and "ABCD " would always precede "AB CD").
I.e. read it as:

  I.e. the same result for either case. In my view the R result is
  anomalous! This strikes me as a nasty little trap!

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-May-10                                       Time: 11:05:44
------------------------------ XFMail ------------------------------

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to