On Mon, Apr 16, 2007 at 11:33:26AM +0330, Ali Majdzadeh wrote:
> Hello All
> Sorry, if my questions are elementary. As I know, the size of wchar_t data
> type (glibc), is compiler and platform dependent. What is the best practice
> of writing portable Unicode-aware C programs? Is it a good practice to use
> Unicode literals directly in a C program?

It depends on the degree of portability you want. Using them in wide
strings is not entirely portable (depends on the translation character
encoding), but using them in UTF-8 strings is (they're just byte
sequences).

> I have experienced some problems
> with glibc's wide character string functions, I want to know is there any
> standard way of programming or standard template to write a Unicode-aware C
> program? By the way, my native language is Persian. I am working on a C
> program which reads a Persian text file, parses it and generates an XML
> document.

If your application is Persian-specific, then you're completely
entitled to assume the text encoding is UTF-8 and that the system is
capable of dealing with UTF-8 and Unicode. Will there be any Persion
specific text processing though or do you just want to be able to pass
through Persian text?

> For this, there exist lots of issues that need the use of library
> functions (eg. wcscpy(), wcsstr(), wcscmp(), fgetws(), wfprintf(), ...),
> and, as I mentioned earlier, I have experienced some odd problems using
> them. (eg. wcsstr() never succeeds in matching two wchar_t * Persian
> strings.)

wcsstr doesn't care about encoding or Unicode semantics or anything.
It just looks for binary substring matches, just like strstr but using
wchar_t instead of char as the unit.

Overall I'd suggest ignoring the wchar_t functions. Especially the
wide stdio functions are problematic. Using UTF-8 is just as easy and
then your strings are directly usable for input and output to/from
text files, commandline, etc.

Rich

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to