Re: Unicode Paths

Sebastian Spaeth Fri, 16 Sep 2011 03:59:05 -0700

On Thu, 15 Sep 2011 13:52:12 -0400, Austin Clements <amdra...@mit.edu> wrote:
> On Tue, Sep 13, 2011 at 11:55 PM, Martin Owens <docto...@gmail.com> wrote:
> > Hello Again,
> >
> > I notice in the lib code notmuch_database_open(),
> > notmuch_database_create() these functions use const char *path for the
> > directory path input. Is this unicode safe?
> >
> > The python bindings (and ctype docs) seem to suggest using something
> > called 'wchar_t *' for accepting unicode but that's for C not C++.
> >
> > Is this something that should be patched?
> 
> char* is the correct type for paths on POSIX systems.  The *meaning*
> of those bytes is a more complicated matter and depends on your locale
> settings.  On old systems it was generally ASCII, on modern systems
> it's generally UTF-8, and it can be many other things.  However, as a
> consequence of UNIX's C heritage, it is *always* terminated with a
> NULL byte and cannot contain embedded NULL's.


Right, that's what we are doing, passing in utf-8 encoded unicode
strings to char*, which should be just fine if that is what the
underlying OS uses.

> wchar_t is another matter entirely.  wchar_t is the type used by C to
> represent wide strings internally, which generally (but not
> necessarily!) means it stores a Unicode code point.  However, this
> isn't an encoding, and different compilers can give wchar_t different
> meanings, so wchar_t strings aren't generally appropriate for storing
> or sharing between processes or with the kernel.

Mmh, I remember I attempted to user wchar_t to pass in unicode objects
directly and it had failed miserably.

Sebastian

pgpmg8LOMgWvP.pgp
Description: PGP signature

_______________________________________________
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch

Re: Unicode Paths

Reply via email to