Interesting! The odd thing is it works perfectly well on Linux
platforms, at least - I guess it must be something to do with the Mac
locales. Thanks!

On Sun, May 7, 2017 at 1:51 PM, peter dalgaard <pda...@gmail.com> wrote:
>
>> On 7 May 2017, at 08:36 , Oliver Keyes <ironho...@gmail.com> wrote:
>>
>> Hey all,
>>
>> I've ran into a weird quirk on Mac platforms, which you can read fully
>> at https://github.com/Ironholds/urltools/issues/70
>>
>> The long and the short of it is that one specific codepoint - \u04cf -
>> does not print in a UTF-8-y way by default, except when run through
>> cat(). Compare, for example:
>>
>> encodeString("\u04cf")
>>
>> and:
>>
>> encodeString("\u044D")
>>
>> Kevin Ushey was kind enough to bring his expertise, and found that it
>> may be a locale-specific problem as well as a Mac-specific problem,
>> because 'sourcetools' shows that there's no locale information for the
>> character. But this only appears in R - Python has it display
>> perfectly - so I'm kind of at a loss. Does anyone know what's going
>> on?
>
> Python being less careful than R?
>
> Basically, things get encoded if not known to be printable, and "Cyrillic 
> Small Letter Palochka" is (it seems) not recorded as printable in the common 
> utf-8 locales. From what I can google, it is used in Chechen and even then 
> only as a postfix to certain characters.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd....@cbs.dk  Priv: pda...@gmail.com
>
>
>
>
>
>
>
>
>

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to