On Mon, Mar 03, 2003 at 12:54:27PM +0000, Franck Arnaud wrote:
> Could you have dummy functions that only do ASCII when glib or whichever
> third party unicode stuff is used? That way you get a soft dependency:
> if it's not there you get slightly crippled functionality but it still
> builds and is not even noticeable to people who don't use this
> functionality.
I currently have my own next/prev char code (really easy with utf8!),
use iconv to convert UTF-8 to wchar_t and use the standard isw* functions
for testing character properties. Unfortunately, while iconv is a standard
function, character set names aren't. On glibc-2.2 based systems the
target character set name is "WCHAR_T", while on older systems that need
separate libiconv (and possibly also libutf8 for wchar support), the target
character set name seems to be "C99". The following pieces in system.mk
enable UTF-8 support. The default will be to not enable UTF-8 support.
# GNU/Linux and other glibc-2.2 based systems.
#DEFINES += -DCF_UTF8 -DCF_ICONV_TARGET=\"WCHAR_T\" \
# -DCF_ICONV_SOURCE=\"UTF-8\"
# Systems that depend on libutf8 and libiconv might want these.
#DEFINES += -DCF_UTF8 -DCF_LIBUTF8 -DCF_ICONV_TARGET=\"C99\" \
# -DCF_ICONV_SOURCE=\"UTF-8\"
#EXTRA_LIBS += -liconv -L/usr/local/lib
#EXTRA_INCLUDES -I/usr/local/include
There is also another problem with the UTF-8 support code: the
Xutf8DrawString and Xutf8TextExtents functions are broken. If locales
aren't correctly set up, they stop at every non-ascii character
instead of ignoring it or printing a box. The corresponding Xmb
functions work properly.
An alternative to UTF-8 might be to use the standard X and libc generic
multibyte encoding support functions if we attempt to detect UTF-8
and 8-bit encodings so that at least those can be processed efficiently
if not every encoding. See the awful code in the attached file. It
fails for 7-bit ASCII, but shouldn't be too hard to fix.
\begin{rant}
Why can't *nix libc:s have an mbsdec function? M$ and Watcom seem to have.
\end{rant}
--
Tuomo
#include <wchar.h>
#include <wctype.h>
#include <langinfo.h>
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <string.h>
#define FALSE 0
#define TRUE 1
#define bool int
static bool e_8bit=FALSE;
static bool e_utf8=FALSE;
int str_prev_off(char *p, int pos)
{
if(pos==0)
return 0;
if(e_8bit)
return pos-1;
if(e_utf8){
while(pos>0){
pos--;
if((p[pos]&0xC0)!=0x80)
break;
}
return pos;
}else{
/* *sigh* */
int l, prev=0;
mbstate_t ps;
memset(&ps, 0, sizeof(ps));
while(1){
l=mbrlen(p+prev, pos-prev, &ps);
if(l==-1){
warn("Invalid multibyte string");
return pos;
}
if(prev+l>=pos)
return prev;
prev+=l;
}
}
}
void test_ctype()
{
int i;
char chs[2]=" ";
wchar_t wc;
char *p;
for(i=0; i<256; i++){
chs[0]=i;
if(mbtowc(&wc, chs, 2)==-1){
fprintf(stderr, "Doesn't look like an 8-bit encoding\n");
break;
}
}
if(i==256){
fprintf(stderr, "Looks like an 8-bit encoding\n");
e_8bit=TRUE;
return;
}
p=nl_langinfo(CODESET);
fprintf(stderr, "%s\n", p);
if(strcmp(p, "UTF-8")==0 || strcmp(p, "UTF8")==0){
fprintf(stderr, "Looks like UTF-8\n");
e_utf8=TRUE;
}
}
int main(void)
{
char *p;
p=setlocale(LC_ALL, "");
if(!p){
perror("setlocale");
return EXIT_FAILURE;
}
test_ctype();
return EXIT_SUCCESS;
}