Hello Rich Thanks for your response. About your question, I should say "yes", I need some text processing capabilities. Do you mean that I should use common stdio functions? (like, fgets(), ...) And what about UTF-8 strings? Do you mean that these strings should be stored in common char* variables? So, what about the character size defference (Unicode and ASCII)? And also, string functions? (like, strtok()) Sorry, I am new to the issue.
Best Regards Ali On 4/16/07, Rich Felker <[EMAIL PROTECTED]> wrote:
On Mon, Apr 16, 2007 at 11:33:26AM +0330, Ali Majdzadeh wrote: > Hello All > Sorry, if my questions are elementary. As I know, the size of wchar_t data > type (glibc), is compiler and platform dependent. What is the best practice > of writing portable Unicode-aware C programs? Is it a good practice to use > Unicode literals directly in a C program? It depends on the degree of portability you want. Using them in wide strings is not entirely portable (depends on the translation character encoding), but using them in UTF-8 strings is (they're just byte sequences). > I have experienced some problems > with glibc's wide character string functions, I want to know is there any > standard way of programming or standard template to write a Unicode-aware C > program? By the way, my native language is Persian. I am working on a C > program which reads a Persian text file, parses it and generates an XML > document. If your application is Persian-specific, then you're completely entitled to assume the text encoding is UTF-8 and that the system is capable of dealing with UTF-8 and Unicode. Will there be any Persion specific text processing though or do you just want to be able to pass through Persian text? > For this, there exist lots of issues that need the use of library > functions (eg. wcscpy(), wcsstr(), wcscmp(), fgetws(), wfprintf(), ...), > and, as I mentioned earlier, I have experienced some odd problems using > them. (eg. wcsstr() never succeeds in matching two wchar_t * Persian > strings.) wcsstr doesn't care about encoding or Unicode semantics or anything. It just looks for binary substring matches, just like strstr but using wchar_t instead of char as the unit. Overall I'd suggest ignoring the wchar_t functions. Especially the wide stdio functions are problematic. Using UTF-8 is just as easy and then your strings are directly usable for input and output to/from text files, commandline, etc. Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/