Re: Questions about Unicode-aware C programs under Linux

ＳｒｉｎＴｕａｒ Mon, 16 Apr 2007 10:23:11 -0700

The best advice you can get is to steer clear of wide characters.
You should never need to use any wide character functions.
Keep the data in your program internally represented as utf-8.
The standard byte-oriented "strlen", "strcpy", "strstr", "printf" etc
work fine with utf-8.


XML uses utf-8 by default as well, so little if any conversion between
encodings should be needed. You may have to convert your input from a
legacy encoding to utf-8, or you could just externally convert using
something such as this:

cat inputfile | iconv -t utf-8 | myprogram

Being "unicode aware" is trivial in this fashion.


2007/4/16, Ali Majdzadeh <[EMAIL PROTECTED]>:

Hello All
Sorry, if my questions are elementary. As I know, the size of wchar_t data
type (glibc), is compiler and platform dependent. What is the best practice
of writing portable Unicode-aware C programs? Is it a good practice to use
Unicode literals directly in a C program? I have experienced some problems
with glibc's wide character string functions, I want to know is there any
standard way of programming or standard template to write a Unicode-aware C
program? By the way, my native language is Persian. I am working on a C
program which reads a Persian text file, parses it and generates an XML
document. For this, there exist lots of issues that need the use of library
functions (eg. wcscpy(), wcsstr(), wcscmp(), fgetws(), wfprintf(), ...),
and, as I mentioned earlier, I have experienced some odd problems using
them. (eg. wcsstr() never succeeds in matching two wchar_t * Persian
strings.)

Best Regards
Ali


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Questions about Unicode-aware C programs under Linux

Reply via email to