Interesting! The odd thing is it works perfectly well on Linux platforms, at least - I guess it must be something to do with the Mac locales. Thanks!
On Sun, May 7, 2017 at 1:51 PM, peter dalgaard <pda...@gmail.com> wrote: > >> On 7 May 2017, at 08:36 , Oliver Keyes <ironho...@gmail.com> wrote: >> >> Hey all, >> >> I've ran into a weird quirk on Mac platforms, which you can read fully >> at https://github.com/Ironholds/urltools/issues/70 >> >> The long and the short of it is that one specific codepoint - \u04cf - >> does not print in a UTF-8-y way by default, except when run through >> cat(). Compare, for example: >> >> encodeString("\u04cf") >> >> and: >> >> encodeString("\u044D") >> >> Kevin Ushey was kind enough to bring his expertise, and found that it >> may be a locale-specific problem as well as a Mac-specific problem, >> because 'sourcetools' shows that there's no locale information for the >> character. But this only appears in R - Python has it display >> perfectly - so I'm kind of at a loss. Does anyone know what's going >> on? > > Python being less careful than R? > > Basically, things get encoded if not known to be printable, and "Cyrillic > Small Letter Palochka" is (it seems) not recorded as printable in the common > utf-8 locales. From what I can google, it is used in Chechen and even then > only as a postfix to certain characters. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > > > > > > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.