Re: [julia-users] readdir returns inconsistent types

Milan Bouchet-Valat Tue, 13 Jan 2015 13:32:28 -0800

Le mardi 13 janvier 2015 à 12:25 -0800, Steven G. Johnson a écrit :
> On Tuesday, January 13, 2015 at 1:47:12 AM UTC-5, [email protected]
> wrote: 
>         
>         But since those annoying operating systems can return
>         filenames encoded in non-UTF8 it probably will not be safe in
>         0.4 to just return a UTF8 string.
> 
> 
> Whatever encoding the operating system uses, Julia (or actually libuv)
> will convert it to UTF-8.   For example, on Windows libuv
> uses FindFirstFileW, which returns the filename in the UTF-16
> encoding, but this is converted to UTF-8.
Actually, not on Unix. AFAICT libuv returns the filename as raw bytes
there, as there's no reliable way to know what's the encoding used by a
filesystem. Reasonable systems uses UTF-8 everywhere, but that's
absolutely not guaranteed. Sometimes a filename may contain invalid
UTF-8 characters (possibly in another, unknown, encoding).


So the best solution for Julia is to return it as a UTF8String, knowing
that in some cases invalid UTF-8 may appear. Validating the file paths
is not possible, as it would prevent accessing such files. Luckily, most
of the time you don't really need to process the file path, just pass it
around without any modifications. (It would also be possible to create a
special string type for file paths, not sure it's worth it.)

See this post for a complaint about Python 3 enforcing Unicode file
paths:
http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/


Regards

Re: [julia-users] readdir returns inconsistent types

Reply via email to