Re: [openZIM dev-l] wide character filenames support

Emmanuel Engelhart Mon, 22 Nov 2010 11:46:05 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 22/11/2010 14:25, Asaf Bartov wrote:
> Note that the bug exists in GNU/Linux as well -- it's just better hidden...
> :)
> UTF8 uses a _variable_ amount of bytes to encode a code point.  Often a
> single byte is enough.  But if your filename includes very special
> characters, such as an "em-dash" (–) or an IPA charachter such as *ʧ* --
> then the character would take up two bytes, and for some obscure characters
> it can be up to _four_ bytes.


There is no issue I think with UTF8 neither with libzim nor with
Kiwix... and file names with em-dash. I have tested and it works. The
reason is I think that the kernel interprets the char* string directly
as UTF8 (ext3/4 is in UTF8).

But on Windows, this is not possible to interpret directly the char* as
UTF16, otherwise if you give a ASCII encoded path it won't work. So I
suppose, STL open() & co (or the kernel) make a charset conversion to
UTF16 before asking the filesystem.

So if you want to open a file with character not in the ASCII charset, I
suppose you have to use a special STL open() accepting wchar and give
the path directly in UTF16.

That is my theory.

> So French accents fit in one byte, but some other characters do not.  If I
> had a ZIM file with such a character on GNU/Linux, the code would fail too.

Does not looks like :)

> We do need a portable solution.   I don't know the right way to do it off
> the top of my head, so perhaps someone else on the list can offer advice.
>  If no one can, I'm willing to figure it out myself.

Yes, would be great. Tommi, your are the STL expert :)

Thanks for your feedback Asaf.
Emmanuel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkzqyTQACgkQn3IpJRpNWtO1iQCfcObWOOjHcuyzCk7lOZitQVVf
g/8AoK1GVk+FewIF5JJwZSa3C0iW1lcA
=+iYd
-----END PGP SIGNATURE-----
_______________________________________________
dev-l mailing list
[email protected]
https://intern.openzim.org/mailman/listinfo/dev-l

Re: [openZIM dev-l] wide character filenames support

Reply via email to